6,971 Matching Annotations
  1. Jun 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      We thank the reviewer for raising the concerns regarding the definition of ROI. Our approach to analyze V1 separately was based on two key considerations. First, previous studies consistently identify V1 as the main locus of sensory-like templates during featurespecific preparatory attention (Kok et al., 2014; Aitken et al., 2020). Second, V1 shows the strongest orientation selectivity within the visual hierarchy (Priebe, 2016). In contrast, the extrastriate visual cortex (EVC; comprising V2, V2, V3AB and V4) demonstrates broader selectivity, such as complex features like contour and texture (Grill-Spector & Malach, 2004). Thus, we think it would be particularly informative to analyze V1 data separately as our experiment examines orientation-based attention. We should also note that we conducted MVPA separately for each visual ROIs (V2, V3, V3AB and V4). After observing similar patterns of results across these regions, we averaged the decoding accuracies into a single value and labeled it as EVC. This approach allowed us to simplify data presentation while preserving the overall data pattern in decoding performance. We now added the related explanations on the ROI definition in the revised texts (Page 26; Line 576-581).

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      We thank the reviewer for the comments. We agree that including valid and neutral cue trials would have provided valuable behavioral measures of attention; Yet, our current design was aimed at maximizing the number of trials for decoding analysis due to fMRI time constraints. Thus, we could not fit additional conditions to measure the behavioral effects of attention. However, we note that in our previous studies using a similar feature cueing paradigm, we observed benefits of attentional cueing on behavioral performance when comparing valid and neutral conditions (Liu et al., 2007; Jigo et al., 2018). Furthermore, our neural data indeed demonstrated attention-related modulation (as indicated by MVPA results, Fig. 2 in the main texts) so we are confident that on average participants followed the instruction and deployed their attention accordingly. We now added the related explanations on this point in the revised texts (Page 23; Line 492-498).

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      We thank the reviewer for this comment. We added the following extensive discussion on this point in the revised texts (Page 18; Line 363-381).

      “It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance. While these functions are intuitively similar and likely overlap, there is also evidence indicating that they can be dissociated (Battistoni et al., 2017). In particular, we note that in our task, attention is guided by symbolic cues (color-orientation associations), while working memory tasks typically present the actual visual stimulus as the memorandum. A central finding in working memory studies is that neural signals during WM maintenance are sensory in nature, as demonstrated by generalizable neural activity patterns from stimulus encoding to maintenance in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019). However, in our task, neural signals during preparation were nonsensory, as demonstrated by a lack of such generalization in the No-Ping session (see also Gong et al., 2022). We believe that the differences in cue format and task demand in these studies may account for such differences. In addition to the difference in the sensory nature of the preparatory versus delay-period activity, our ping-related results also exhibited divergence from working memory studies (Wolff et al., 2017; 2020). While these studies used the visual impulse to differentiate active and latent representations of different items (e.g., attended vs. unattended memory item), our study demonstrated the active and latent representations of a single item in different formats (i.e., non-sensory vs. sensory-like). Moreover, unlike our study, the impulse did not evoke sensory-like neural patterns during memory retention (Wolff et al., 2017). These observations suggest that the cognitive and neural processes underlying preparatory attention and working memory maintenance could very well diverge. Future studies are necessary to delineate the relationship between these functions both at the behavioral and neural level.”

      (4) If I understand correctly, the only ROI that showed a significant difference for the crosstask generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      We thank the reviewer for this comment. We would like to clarify that our analyses revealed similar patterns of preparatory attentional representations in V1 and EVC. During the Ping session, the cross-task generalization analyses revealed decodable information in both V1 and EVC (ps < 0.001), significantly higher than that in the No-Ping session for V1 (independent t-test: t(38) = 3.145, p = 0.003; Cohen’s d = 0.995) and EVC (independent t-test: t(38) = 2.153, p = 0.038, Cohen’s d = 0.681) (Page 10; Line 194-196). While both areas maintained similar representations, additional measures (Mahalanobis distance, neural-behavior relationship and connectivity changes) showed more robust ping-evoked changes in V1 compared to EVC. This differential pattern likely reflects the primary role of V1 in orientation processing, with EVC showing a similar but weaker response profile. We have revised the text to clarity this point (Page 16; Line 327-329).

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      We thank the reviewer for this comment. The reviewer proposed an alternative account suggesting that visual pings may function to refocus attention, rather than reactivate latent information during the preparatory period. If this account holds (i.e., attention became weaker in the no-ping condition and it was strengthened by the ping due to re-focusing), we would expect to observe a general enhancement of attentional decoding during the preparatory period. However, our data reveal no significant differences in overall attention decoding between two conditions during this period (ps > 0.519; BF<sub>excl</sub> > 3.247), arguing against such a possibility.

      The reviewer also raised an interesting question about whether an auditory tone during preparation could produce effects similar to those observed with visual pings. Although our study did not directly test this possibility, existing literature provides some relevant evidence. In particular, prior studies have shown that latent visual working memory contents are selectively reactivated by visual impulses, but not by auditory stimuli (Wolff et al., 2020). This finding supports the modality-specificity for visually encoded contents, suggesting that sensory impulses must match the representational domain to effectively access latent visual information, which also argues against the refocusing hypothesis above. However, we do think that this is an important question that merits direct investigation in future studies. We now added the related discussion on this point in the revised texts (Page 10, Line 202-203; Page 19, Line 392395).

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      We thank the reviewer for this comment. We took the reviewer’s suggestion to explore the relationship between attentional modulation index (AMI) and RTs across participants for each session (see Figure 3). In the No-Ping session, we observed no significant correlation between AMI and RT (r = -0.366, p = 0.113). By contrast, the same analysis in the Ping condition revealed a significantly negative correlation (r = -0.518, p = 0.019). These results indicate that the attentional modulations evoked by visual impulse was associated with faster RTs, supporting the functional relevance of activating sensory-like representations during preparation. We have now included these inter-subject correlations in the main texts (Page 13, Line 258-264; Fig 3D and 3E) along with within-subject correlations in the Supplementary Information (Page 6, Line, 85-98; S3 Fig).

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      We thank the reviewer for this comment regarding the mechanistic basis of visual pings. We agree that this warrants deeper discussion. One possibility, as informed by theoretical studies of working memory, is that the sensory-like template could be maintained via an “activity-silent” mechanism through short-term changes in synaptic weights (Mongillo et al., 2008). In this framework, a visual impulse may function as nonspecific inputs that momentarily convert latent traces into detectable activity patterns (Rademaker & Serences, 2017). Related to our findings, it is unlikely that the orientation-specific templates observed during the Ping session emerged from purely non-sensory representations and were entirely induced by an exogenous ping, which was devoid of any orientation signal. Instead, the more parsimonious explanation is that visual impulse reactivated pre-existing latent sensory signals. To our knowledge, the detailed circuit-level mechanism of such reactivation is still unclear; existing evidence only suggests a relationship between ping-evoked inputs and the neural output (Wolff et al., 2017; Fan et al., 2021; Duncan et al., 2023). We now included the discussion on this point in the main texts (Page 19, Line 383-401).

      Reviewer #2 (Public review):

      (1) The origin of the latent sensory-like representation. By 'pinging' the neural activity with a high-contrast, task-irrelevant visual stimulus during the preparation period, the authors identified the representation of the attentional feature target that contains the same information as perceptual representations. The authors interpreted this finding as a 'sensory-like' template is inherently hosted in a latent form in the visual system, which is revealed by the pinging impulse. However, I am not sure whether such a sensory-like template is essentially created, rather than revealed, by the pinging impulses. First, unlike the classical employment of the pinging technique in working memory studies, the (latent) representation of the memoranda during the maintenance period is undisputed because participants could not have performed well in the subsequent memory test otherwise. However, this appears not to be the case in the present study. As shown in Figure 1C, there was no significant difference in behavioral performance between the ping and the no-ping sessions (see also lines 110-125, pg. 5-6). In other words, it seems to me that the subsequent attentional task performance does not necessarily rely on the generation of such sensory-like representations in the preparatory period and that the emergence of such sensory-like representations does not facilitate subsequent attentional performance either. In such a case, one might wonder whether such sensory-like templates are really created, hosted, and eventually utilized during the attentional process. Second, because the reference orientations (i.e. 45 degrees and 135 degrees) have remained unchanged throughout the experiment, it is highly possible that participants implicitly memorized these two orientations as they completed more and more trials. In such a case, one might wonder whether the 'sensory-like' templates are essentially latent working memory representations activated by the pinging as was reported in Wolff et al. (2017), rather than a functional signature of the attentional process.

      We thank the reviewer for this comment. We agree that the question of whether the sensory-like template is created or merely revealed by visual pinging is crucial for the understanding our findings. First, we acknowledge that our task may not be optimized for detecting changes in accuracy, as the task difficulty was controlled using individually adjusted thresholds (i.e., angular difference). Nevertheless, we observed some evidence supporting the neural-behavioral relationships. In particular, the impulse-driven sensory-like template in V1 contributed to facilitated faster RTs during stimulus selection (Page 12, Fig. 3D and 3E in the main texts; also see our response to R1, Point 6).

      Second, the reviewer raised an important concern about whether the attended feature might be stored in the memory system due to the trial-by-trial repetition of attention conditions (attend 45º or attend 135º). Although this is plausible, we don’t think it is likely. We note that neuroimaging evidence shows that attended working memory contents maintain sensory-like representations in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019), with generalizable neural activity patterns from perception to working memory delay-period, whereas unattended items in multi-item working memory tasks are stored in a latent state for prospective use (Wolff et al., 2017). Importantly, our task only required maintaining a single attentional template at a time. Thus, there was no need to store it via latent representations, if participants simply used a working memory mechanism for preparatory attention. Had they done so, we should expect to find evidence for a sensory template, i.e., generalizable neural pattern between perception and preparation in the No-Ping condition, which was not what we found. We have mentioned this point in the main texts (Page 18, Line 367-372).

      (2) The coexistence of the two types of attentional templates. The authors interpreted their findings as the outcome of a dual-format mechanism in which 'a non-sensory template' and a latent 'sensory-like' template coexist (e.g. lines 103-106, pg. 5). While I find this interpretation interesting and conceptually elegant, I am not sure whether it is appropriate to term it 'coexistence'. First, it is theoretically possible that there is only one representation in either session (i.e. a non-sensory template in the no-ping session and a sensory-like template in the ping session) in any of the brain regions considered. Second, it seems that there is no direct evidence concerning the temporal relationship between these two types of templates, provided that they commonly emerge in both sessions. Besides, due to the sluggish nature of fMRI data, it is difficult to tell whether the two types of templates temporally overlap.

      We thank the reviewer for the comment regarding our interpretation of the ‘coexistence’ of non-sensory and sensory-like attentional template. While we acknowledge the limitations of fMRI in resolving temporal relationships between these two types of templates, several aspects of our data support a dual-format interpretation.

      First, our key findings remained consistent for the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. It thus seems improbable that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the task-irrelevant, uninformative visual impulse. Second, while we agree with the reviewer that the temporal dynamics between these two templates remain unclear, it is difficult to imagine that orientation-specific templates observed during the Ping session emerged de novo from a purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? It seems to us that the more parsimonious explanation is that there was already some orientation signal in a latent format, and it was activated by the ping, in line with the models of “activity-silent” working memory. To address these concerns, we have added the related discussion of these alternative interpretations in the main texts (Page 19, Line 387-391)

      (3) The representational distance. The authors used Mahalanobis distance to quantify the similarity of neural representation between different conditions. According to the authors' hypothesis, one would expect greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session. However, this appears not to be the case. As shown in Figures 3B and C, there was no major difference in Mahalanobis distance between the two sessions in either ROI and the authors did not report a significant main effect of the session in any of the ANOVAs. Besides, in all the ANOVAs, the authors reported only the statistic term corresponding to the interaction effect without showing the descriptive statistics related to the interaction effect. It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective and intuitive understanding of their data.

      We thank the reviewer for this comment. We expected greater pattern similarity between 'attend leftward' and 'perceived leftward' in the Ping session in comparison to the Noping session. This prediction was supported by a significant three-way interaction effect between session × attended orientation × perceived orientation (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116). In particular, there was a significant interaction between attended orientation × perceived orientation (F(1,19) = 9.335, p = 0.007, η<sub>p</sub><sup>2</sup> = 0.329) in the Ping session, but not in the No-Ping session (F(1,19) = 0.017, p = 0.898, η<sub>p</sub><sup>2</sup> = 0.001). These above-mentioned statistical results were reported in the original texts. In addition, this three-way mixed ANOVA (session × attended orientation × perceived orientation) on Mahalanobis distance in V1 revealed no significant main effects (session: F(1,38) = 0.009, p = 0.923, η<sub>p</sub><sup>2</sup> < 0.001; attended orientation: F(1,38) = 0.116, p = 0.735, η<sub>p</sub><sup>2</sup> = 0.003; perceived orientation: (F(1,38) = 1.106, p = 0.300, η<sub>p</sub><sup>2</sup> = 0.028). We agree with the reviewer that a complete reporting of analyses enhances understanding of the data. Therefore, we have now included the main effects in the main texts (Page 11, Line 233).

      We thank the reviewer for the suggestion regarding the inclusion of descriptive statistics for interaction effects. However, since the data were already visualized in Fig. 3B and 3C in the main texts, to maintain conciseness and consistency with the reporting style of other analyses in the texts, we have opted to include these statistics in the Supplementary Information (Page 5, Table 1).

      Reviewer #3 (Public review):

      (1) The title is "Dual-format Attentional Template," yet the supporting evidence for the nonsensory format and its guiding function is quite weak. The author could consider conducting further generalization analysis from stimulus selection to preparation stages to explore whether additional information emerges.

      We thank the reviewer for this comment. Our approach to investigate whether preparatory attention is encoded in sensory or non-sensory format - by training classifier using separate runs of perception task – closely followed methods from previous studies (Stokes et al., 2009; Peelen et al., 2011; Kok et al., 2017). Following the reviewer’s suggestion, we performed generalization analyses by training classifiers on activity during the stimulus selection period and testing them preparatory activity. However, we observed no significant generalization effects in either No-Ping and Ping sessions (ps > 0.780). This null result may stem from a key difference in the neural representations: classifiers trained on neural activity from stimulus selection period necessarily encode both target and distractor information, thus relying on somewhat different information than classifier trained exclusively on isolated target information in the perception task.

      (2) In Figure 2, the author did not find any decodable sensory-like coding in IPS and PFC, even during the impulse-driven session, indicating that these regions do not represent sensory-like information. However, in the final section, the author claimed that the impulse-driven sensorylike template strengthens informational connectivity between sensory and frontoparietal areas. This raises a question: how can we reconcile the lack of decodable coding in these frontoparietal regions with the reported enhancement in network communication? It would be helpful if the author provided a clearer explanation or additional evidence to bridge this gap.

      We thank the reviewer for this comment. We would like to clarity that although we did not observe sensory-like coding during preparation in frontoparietal areas, we did observe attentional signals in these regions, as evidenced by the above-chance within-task attention decoding performance (Fig. 2 in the main texts). This could reflect different neural codes in different areas, and suggests that inter-regional communication does not necessarily require identical representational formats. It seems plausible that the representation of a non-sensory attentional template in frontoparietal areas supports top-down attentional control, consistent with theories suggesting increasing abstraction as the cortical hierarchy ascends (Badre, 2008; Brincat et al., 2018), and their interaction with the sensory representation in the visual areas is enhanced by the visual impulse.

      (3) Given that the impulse-driven sensory-like template facilitated behavior, the author proposed that it might also enhance network communication. Indeed, they observed changes in informational connectivity. However, it remains unclear whether these changes in network communication have a direct and robust relationship with behavioral improvements.

      We thank the reviewer for the suggestion. To examine how network communication relates to behavior, we performed a correlation analysis between information connectivity (IC) and RTs across participants (see Figure S5). We observed a trend of correlations between V1-PFC connectivity and RTs in the Ping session (r = -0.394, p = 0.086), but not in the NoPing session (r = -0.046, <i.p\</i> = 0.846). No significant correlations were found between V1-IPS and RTs (\ps\ > 0.400) or between ICs and accuracy (ps > 0.399). These results suggests that ping-enhanced connectivity might contributed to facilitated responses. Although we may not have sufficient statistical power to warrant a strong conclusion, we think this result is still highly suggestive, so we now added the texts in the Supplementary Information (Page 8, Line 116121; S5 Fig) and mentioned this result in the main texts (Page 14, Line 292-293).

      (4) I'm uncertain about the definition of the sensory-like template in this paper. Is it referring to the Ping impulse-driven condition or the decodable performance in the early visual cortex? If it is the former, even in working memory, whether pinging identifies an activity-silent mechanism is currently debated. If it's the latter, the authors should consider whether a causal relationship - such as "activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas" - is reasonable.

      We apologize for the confusions. The sensory-like template by itself does not directly refer to representations under Ping session or the attentional decoding in early visual cortex. Instead, it pertains to the representational format of attentional signals during preparation. Specifically, its existence is inferred from cross-task generalization, where neural patterns from a perception task (perceive 45º or perceive 135º) generalize to an attention task (attend 45 º or attend 135º). We think this is a reasonable and accepted operational definition of the representational format. Our findings suggest that the sensory-like template likely existed in a latent state and was reactivated by visual pings, aligning more closely with the first account raised by the reviewer.

      We agree with the reviewer that whether ping identifies an activity-silent mechanism is currently debated (Schneegans & Bays, 2017; Barbosa et al., 2021). It is possible that visual impulse amplified a subtle but active representation of the sensory template during attentional preparation and resulted in decodable performance in visual cortex. Distinguishing between these two accounts likely requires neurophysiological measurements, which are beyond the scope of the current study. We have explicitly addressed this limitation in our Discussion (Page 19, Line 395-399).

      Nevertheless, the latent sensory-like template account remains plausible for three reasons. First, our interpretation aligns with theoretical framework proposing that the brain maintains more veridical, detailed target templates than those typically utilized for guiding attention (Wolfe, 2021; Yu et al., 2023). Second, this explanation is consistent with the proposed utility of latent working memory for prospective use, as maintaining a latent sensory-like template during preparation would be useful for subsequent stimulus selection. The latter point was further supported by the reviewer’s suggestion about whether “activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas is reasonable”. Our additional analyses (also refer to our response to Reviewer 3, Point 3) suggested that impulse-enhanced V1-PFC connectivity was associated with a trend of faster behavioral responses (r = -0.394, p = 0.086; see Supplementary Information, Page 8, Line 116-121; S5 Fig). Considering these findings in totality, we think it is reasonable to suggest that visual impulse may strengthen information flow among areas to enhance attentional control.

      Recommendation for the Authors:

      Reviewer #1 (Recommendation for the authors):

      I hate to suggest another fMRI experiment, but in order to make strong claims about two states, I would want to see the methodological and interpretation confounds addressed. Ping condition - would a tone lead to the same result of sharpening the template? If so, then why? Can a ping be manipulated in its effectiveness? That would be an excellent manipulation condition.

      We thank the reviewer for the comments. Please refer to our reply to Reviewer 1, Point 5 for detailed explanation.

      Reviewer #2 (Recommendation for the authors):

      It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective understanding of their data.

      We thank the reviewer for the comments. We now included the relevant descriptive statistics in the Supplementary Information, Table 1.

      Reviewer #3 (Recommendation for the authors):

      In addition to p-values, I see many instances of 'ps'. Does this indicate the plural form of p?

      We used ‘ps’ to denote the minimal p-value across multiple statistical analyses, such as when applying identical tests to different region groups.

      References

      Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023.

      Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193-200.

      Barbosa, J., Lozano-Soldevilla, D., & Compte, A. (2021). Pinging the brain with visual impulses reveals electrically active, not activity-silent, working memories. PLoS Biology, 19(10), e3001436.

      Battistoni, E., Stein, T., & Peelen, M. V. (2017). Preparatory attention in visual cortex. Annals of the New York Academy of Sciences, 1396(1), 92-107.

      Brincat, S. L., Siegel, M., von Nicolai, C., & Miller, E. K. (2018). Gradual progression from sensory to task-related processing in cerebral cortex. Proceedings of the National Academy of Sciences, 115(30), E7202-E7211.

      Duncan, D. H., van Moorselaar, D., & Theeuwes, J. (2023). Pinging the brain to reveal the hidden attentional priority map using encephalography. Nature Communications, 14(1), 4749.

      Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuroscience, 27(1), 649-677.

      Gong, M., Chen, Y., & Liu, T. (2022). Preparatory attention to visual features primarily relies on nonsensory representation. Scientific Reports, 12(1), 21726.

      Fan, Y., Han, Q., Guo, S., & Luo, H. (2021). Distinct Neural Representations of Content and Ordinal Structure in Auditory Sequence Memory. Journal of Neuroscience, 41(29), 6290–6303.

      Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.

      Jigo, M., Gong, M., & Liu, T. (2018). Neural determinants of task performance during feature-based attention in human cortex. eNeuro, 5(1).

      Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evoke stimulus templates in the primary visual cortex. Journal of Cognitive Neuroscience, 26(7), 1546-1554.

      Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473-10478.

      Liu, T., Stevens, S. T., & Carrasco, M. (2007). Comparing the time course and efficacy of spatial and feature-based attention. Vision Research, 47(1), 108-113.

      Mongillo, G., Barak, O., & Tsodyks, M. (2008). Synaptic theory of working memory. Science, 319(5869), 1543-1546.

      Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125-12130. Priebe, N. J. (2016). Mechanisms of orientation selectivity in the primary visual cortex. Annual Review of Vision Science, 2(1), 85-107.

      Rademaker, R. L., & Serences, J. T. (2017). Pinging the brain to reveal hidden memories. Nature Neuroscience, 20(6), 767-769.

      Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336-1344.

      Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207-214.

      Schneegans, S., & Bays, P. M. (2017). Restoration of fMRI decodability does not imply latent working memory states. Journal of Cognitive Neuroscience, 29(12), 1977-1994.

      Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences, 106(46), 19569-19574.

      Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060-1092.

      Wolff, M. J., Jochim, J., Akyürek, E. G., & Stokes, M. G. (2017). Dynamic hidden states underlying working-memory-guided behavior. Nature Neuroscience, 20(6), 864 – 871.

      Wolff, M. J., Kandemir, G., Stokes, M. G., & Akyürek, E. G. (2020). Unimodal and bimodal access to sensory working memories by auditory and visual impulses. Journal of Neuroscience, 40(3), 671-681.

      Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E., & Geng, J. J. (2023). Good-enough attentional guidance. Trends in Cognitive Sciences, 27(4), 391-403.

    1. Reviewer #1 (Public review):

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.

      The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.

      The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.

      The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.

      I only have two questions/concerns about the specific methodologies used:

      (1) Synapse counts were made only at the 3 kHz place on the cochlea. But the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?

      [Note added after revision: the authors have added new data, references, and discussion that have answered my initial questions].

      (2) Unless I misunderstood, the predictive power of the final model was not tested on held out data. The standard way to fit and test such model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only spilt was for training and hyperparameter optimization.

      [Note added after revision: the authors now make it clear in their response that the modeling tells us how much of the current data can be explained but not necessary about generalization to other datasets.]

      While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixings these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us?

      I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.

      [Note added after revision: the authors have added to the Discussion to put their work into a broader perspective.]

      That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.

      [Note added after revision: my intention with this comment was not to make a philosophical or nitty-gritty point about statistics. It was more of a follow on to the previous point. Because I don't know what sort of effect size is big enough to matter (for whatever purpose), I don't find the statistical significance (or lack thereof) of the effect size observed to be informative. But I don't think there is anything more that the authors can or should do in this regard.]

      So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, Dop1R1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that Dop1R1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest that different internal states (courtship failure vs. starvation) modulate the peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We have discussed these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      This is an important consideration. To rule out potential confound from food intake during courtship conditioning, we have now also conducted courtship conditioning in vials absent of food. In the absence of any feeding opportunity over the 4.5-hour courtship conditioning period, sexually rejected males still exhibited a robust decrease in sweet taste sensitivity compared with Naïve and Satisfied controls (Figure 1-supplement 1C). These data confirm that the suppression of PER is driven by courtship failure per se, rather than by differences in feeding during the conditioning phase.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Our initial description of the experimental setup might be a bit confusing. Here is a brief clarification of our experimental design and we have further clarified the details in the revised manuscript, which should resolve the reviewer’s concerns:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the dose-response curve). On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005).

      In the initial submission, we used 400 mM sucrose for the MAFE assay. When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding, as a natural consequence of decreased sugar sensitivity (Figure 1B). We were able to quantify the actual volume of food consumed of these flies showing PER responses towards 400 mM sucrose and observed no change (Figure 1-supplement 1A, left). To avoid potential confusion, we have now repeated the MAFE assay with 800 mM sucrose, which elicited feeding in ~100% of flies among all three groups, as shown in Figure 1C. Again, we observed no change in food intake (Figure 1-supplement 1A, right).

      These experiments in combination suggest that sexual failure suppresses sweet sensitivity of the Failed males. Meanwhile, as long as they still responded to a certain food stimulus and initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We have now quantified the effect of TH<sup>+</sup> neuron activation on Gr5a<sup>+</sup> neuron calcium responses. in Naïve males, dTRPA1-mediated activation of TH<sup>+</sup> cells significantly enhanced sucrose-induced calcium responses (Figure 3-supplement 1C); while in Failed males, the baseline activity of Gr5a<sup>+</sup> neurons was lower (Figure 3C), the same activation also produced significant (even slightly larger) effect on the calcium responses of Gr5a<sup>+</sup> neurons (Figure 3-supplement 1D).

      Taken together, we would argue that these experiments using both Naïve and Failed males were adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3-supplement 1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuitry.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We have added detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. For specificity, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. We have now addressed the issues by re-conducting these calcium imaging experiments again with a head-to-head comparison with the controls (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi).

      In these new experiments, Dop1R1 or Dop2R knockdown completely prevented the suppression of Gr5a<sup>+</sup> neuron responsiveness by courtship failure (Figure 6E), whereas the activities of Gr5a<sup>+</sup> neurons in Naïve/Satisfied groups were not altered. These results demonstrate that Dop1R1 and Dop2R are specifically required to mediate the decrease in sweet sensitivity following courtship failure.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We have changed the wording in the revised manuscript. In brief, we think that these manipulations have two consequences: suppressing the overall sweet sensitivity, and eliminating the effect of sexual failure on sweet sensitivity.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine the exact mechanism of how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds Figure 1-supplement 1D-E) upon courtship failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We have further discussed this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Please refer to our responses to the Public Review (Reviewer 1, Point 2) for details.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The label for the y-axis in Figure 1B should be "fraction", not "percentage".

      We have revised the figure as suggested.

      (2) I suggest that the authors indicate the ROIs they used to quantify the signal intensity in Figure 3E and G.

      We have revised the figures as suggested.

      (3) There is a typo in Figure 4A: it should be "Wilde type", not "Wide type".

      We have revised the figure as suggested.

      (4) The elav-GAL4/+ data in Figure 4-S1B, C, and D appears to be reused across these panels. However, the number of asterisks indicating significance in the MAT plots differs between them (three in panels B and C, and four in panel D). Is this a typo?

      It is indeed a typo, and we have revised the figure accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional comments:

      The authors should add this missing literature about dopamine and neuromodulation in courtship:

      Boehm et al., 2022 (eLife) - this study shows that mating affects olfactory behavior in females.

      Cazalé-Debat et al., 2024 (Nature) - Mating proximity blinds threat perception.

      Gautham et al., 2024 (Nature) - A dopamine-gated learning circuit underpins reproductive state-dependent odor preference in Drosophila females.

      We have added these references in the introduction section.

      Has the mating behavior been quantified? How often did males copulate with mated and virgin females?

      We tried to examine the copulation behavior based on our video recordings. In the “Failed” group (males paired with mated females), we observed virtually no successful copulation events at all, confirming that nearly 100% of those males experienced sexual failure. In contrast, males in the “Satisfied” group (paired with virgin females) mated on average 2-3 times during the 4.5-hour conditioning period. We have added some explanations in the manuscript.

      Do the rejected males live shorter? Is the effect also visible when they are fed with normal fly food, or is it only working with sugar?

      We did not directly measure the lifespan of these males. But we conducted a relevant assay (starvation resistance), in which “Failed” males died significantly faster than both Naïve and Satisfied controls, indicating a clear reduction in their ability to endure food deprivation (Figure 1-supplement 1B). Since sweet taste is a primary cue for food detection in Drosophila, and sugar makes up a large portion of their standard diet, the drop in sugar sensitivity we observed in Failed males could likewise impair their perception and consumption of regular fly food, hence their resistance to starvation.

      Also, the authors mention that the reward pathway is affected, this is probably the case as sugar sensation is impaired. One interesting experiment would be (and maybe has been done?) to test rejected males in normal odor-fructose conditioning. The data would suggest that they would do worse.

      We have already measured how courtship failure affected fructose sensitivity (Figure 1 supplement 1D), and we found that the reduction in fructose perception was even more profound than for sucrose. We have not yet tested whether Failed males showed deficits in odor-fructose associative conditioning. That was indeed a very interesting direction to explore. But olfactory reward learning relies on molecular and circuit mechanisms distinct from those governing taste. We therefore argue such experiments would be more suitable in a separate, follow up study.

      The authors could have added another group where males are exposed to other males. It would be interesting if this is also a "stressful" context and if it would also reduce sugar preference - probably beyond the scope of this paper.

      In our experiments, all flies, including those in the Naïve, Failed, and Satisfied groups, were housed in groups of 25 males per vial before the conditioning period (and the Naïve group remained in the same group housing until PER testing). This means every cohort experienced the same level of “social stress” from male-male interactions. While it would indeed be interesting to compare that to solitary housing or other male-only exposures, isolation itself imposes a different kind of stress, and disentangling these effects on sugar preference would require a separate, dedicated study beyond the scope of the present work.

      Would the behavior effect also show up with experienced males? Maybe this has been tested before. Does mating rejection in formerly successful males have the same impact?

      As suggested by the reviewer, we performed an additional experiment in which males that had previously mated successfully were subsequently subjected to courtship rejection. As shown in Figure 1 supplement 1F, prior successful mating did not prevent the decline in sweet sensitivity induced by subsequent mating failure, indicating that even experienced males exhibit the reduction in sugar sensitivity after rejection.

      Is the same circuit present and functioning in females? Does manipulating dopamine receptors in GR5a neurons in females lead to the same phenotype? This would suggest that different internal states in males and females could lead to the same phenotype and circuit modulations.

      This is indeed a very interesting suggestion. In male flies, Gr5a-specific knockdown of dopamine receptors did not alter baseline sweet sensitivity, but it selectively prevented the reduction in sugar perception that followed mating failure (Figure 6C-D), indicating that this dopaminergic pathway is engaged only in the context of courtship rejection. By extension, knocking down the same receptors in female GR5a neurons would likewise be expected to leave their basal sugar sensitivity unchanged. Moreover, because there is currently no established paradigm for inducing mating failure in female flies, we cannot yet test whether sexual rejection similarly modulates sweet taste in females, or whether it operates via the same circuit.

      Reviewer #3 (Recommendations for the authors):

      Suggestions to the authors:

      Introduction, line 61. I suggest the authors add references in fruit flies concerning the rewarding nature of mating. For example, the paper from Zhang et al, 2016 "Dopaminergic Circuitry Underlying Mating Drive" demonstrates the role of the dopamine rewarding system in mating drive. There is a large body of literature showing the link between dopamine and mating.

      We have added this literature in the introduction section.

      Figure 1B and Figure Supplement 1: If I understood correctly, Figure Supplement 1A shows that the total food consumption across all tested flies remains unchanged. However, fewer flies that failed to mate consumed sucrose. I would be curious to see the results for sucrose consumption per individual fly that did eat. According to their results, individual flies that failed to mate should consume more sucrose. This would change the conclusion. The authors currently show that a group of flies that failed to mate consumed less sucrose overall, but since fewer males actually ate, those that failed to mate and did eat consumed more sucrose. The authors should distinguish between failed and satisfied flies in two groups: those that ate and those that did not.

      Please see our responses to the Public Review for details (Reviewer 1, Point 2).

      Figure 1C, right: For a better understanding of all the "MAT" figures, I suggest the authors start the Y axis with the unit 25 and increase it to 400. This would match better the text (line 114) saying that it was significantly elevated in the failed group. As it is, we have the impression of a decrease in the graph.

      We have revised the figures accordingly.

      Line 103: When suggesting a reduced likelihood of meal initiation of these males, do these males take longer to eat when they did it? In other words, is the latency to eat increased in failed males? That would be a good measure of motivational state.

      We tried to analyze feeding latency in the MAFE assay by measuring the time from sucrose presentation to the first proboscis extension, but it was too short to be accurately accounted. Nevertheless, when conducting the experiments, we did not feel/observe any significant difference in the feeding latency between Failed males and Naïve or Satisfied controls.

      Line 117. I don't understand which results the authors refer to when writing "an overall elevation in the threshold to initiate feeding upon appetitive cues". Please specify.

      This phrase refers to the fact that for every sweet tastant we tested, including sucrose (Figure 1C), fructose and glucose (Figure 1 supplement 1D-E), the concentration-response curve in Failed males shifted to the right, and the Mean Acceptance Threshold (MAT) was significantly higher. In other words, for these different appetitive cues, mating failure raised the concentration of sugar required to trigger a proboscis extension, indicating a general elevation in the threshold to initiate feeding upon an appetitive cue.

      Figure 1D. Please specify the time for the satisfied group.

      For clarity, the Naïve and Satisfied groups in Figure 1D each represent pooled data from 0 to 72 hours post-treatment, as their sweet sensitivity remained stable throughout this period. Only the Failed group was shown with time-resolved data, since it was the only group exhibiting a dynamic change in sugar sensitivity over time. We have now specified this in the figure legend.

      Figure 1F. The phenotype was not totally reversed in failed-re-copulated males. Could it be due to the timing between failure and re-copulation? I suggest the authors mention in the figure or in the text, the time interval between failure and re-copulation.

      We’d like to clarify that the interval between the initial treatment (“Failed”) and the opportunity for re copulation was within 30 minutes. The incomplete reversal in the Failed-re-copulated group indeed raised interesting questions. One possible explanation is that mating failure reduces synaptic transmissions between the SEZ dopaminergic neurons and Gr5a<sup>+</sup> sweet sensory neurons (Figure 3), and the regeneration of these transmissions takes a longer time. We have added this information to the figure legend and the Method section.

      Line 227-228 and Figure 3E. The authors showed that the synaptic connections between dopaminergic neurons and Gr5a+ GRNs were significantly weakened. I am wondering about the delay between mating failure and the GFP observation. It would be informative to know this timing to interpret this decrease in synaptic connections. If the timing is relatively long, it is possible that we can observe a neuronal plasticity. However, if this timing is very short, I would not expect such synaptic plasticity.

      The interval between the behavioral treatment and the GRASP-GFP experiment was approximately 20 hours. We chose this time window because it was sufficient for both GFP expression and accumulation. Therefore, the observed reduction in synaptic connections between dopaminergic neurons and Gr5a<sup>+</sup> GRNs likely reflects a genuine, experience-induced structural and functional change rather than an immediate, transient effect. We have added this information to the revised manuscript for clarity in the Method section.

      Line 240-243: The authors demonstrated that there is a reduction of CaLexA-mediated GFP signals in dopaminergic neurons in the SEZ after mating failure, but not a reduction in Gr5a+ GRNs. I suggest replacing "indicate" with "suggest' in line 240.

      We have made the change accordingly. Meanwhile, we would like to clarify that while we observed a reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G), we did not directly test NFAT signal in Gr5a<sup>+</sup> neurons. Notably, the results that the synaptic transmissions from SEZ dopaminergic neurons to Gr5a<sup>+</sup> neurons were weakened (Figure 3E-F), and the reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G-I), were in line with a reduction in sweet sensitivity of Gr5a<sup>+</sup> neurons upon courtship failure (Figure 3B-D).

      Line 243: replace "consecutive" with "constitutive".

      We have revised it accordingly.

      Figure 5: I have trouble understanding the results obtained in Figure 5. Both constitutive activation and inhibition of Dop1R1 and Dop2R neurons lead to the same results, knowing that males who failed mating no longer exhibit decreased sweet sensitivity. I would have expected contrary results for both experimental conditions. I suggest the author to discuss their results.

      Both activation and inhibition of Dop1R1 and Dop2R neurons eliminated the effect of courtship failure on sweet sensitivity (Figure 5). These results are in line with our hypothesis that courtship failure leads to changes in dopamine signaling and hence sweet sensitivity. If dopamine signaling via Dop1R1 and Dop2R was locked, either to a silenced or a constitutively activated state, the effect of courtship failure on sweet sensitivity was eliminated.

      Nevertheless, as the reviewer pointed out, constitutive activation/inhibition should in principle lead to the opposite effect on Naïve flies. In fact, when Dop1R1<sup>+</sup>/Dop2R<sup>+</sup> neurons were silenced in Naïve flies, PER to sucrose was significantly reduced (Figure 5C-D), confirming that these neurons normally facilitate sweet sensation. Meanwhile, while neuronal activation by NaChBac did show a trend towards enhanced PER compared to the GAL4/+ controls, it did not exhibit a difference compared to +>UAS-NaChBac controls that showed a high PER level, likely due to a potential ceiling effect. We have added the discussions to the manuscript.

      Figure 7: I suggest the authors modify their figure a bit. It is not clear why in failed mating, the red arrow in "behavioral modulation" goes to the fly. The authors should find another way to show that mating failure decreased the percentage of flies that are motivated to eat sugar.

      We have modified the figure as suggested.

      Overall, I would suggest the authors be precautious with their conclusion. For example, line 337= "sexual failure suppressed feeding behavior". This is not what is shown by this study. Here, the study shows that mating failure decreases the fraction of flies to eat sucrose. Unless the authors demonstrate that this decrease is generalizable to other metabolites, I suggest the authors modify their conclusion.

      While we primarily used sucrose as the stimulant in our experiments, we also tested responses to two other sugars: fructose and glucose (Figure 1 supplement 1D-E). In all three cases, mating failure led to a significant reduction in sweet perception, suggesting that the effect of courtship failure is not limited to a single metabolite but rather reflects a general decrease in sweet sensitivity. Meanwhile, reduced sweet sensitivity indeed led to a reduction of feeding initiation (Figure 1).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      In the future, could you please include the exact changes made to the manuscript in the relevant section of the rebuttal, so it's clear which changes addressed the comment? That would make it easier to see what you refer to exactly - currently I have to guess which manuscript changes implement e.g. "We have tried to make these points more evident".

      Yes, we apologize for the inconvenience.

      On possible navigation solutions:

      I'm not sure if I follow this argument. If the networks uses a shifted allocentric representation centred on its initial state, it couldn't consistently decode the position from different starting positions within the same environment (I don't think egocentric is the right term here - egocentric generally refers to representations relative to the animal's own direction like "to the left" rather than "to the west" but these would not work in the allocentric decoding scheme here). In other words: If I path integrate my location relative to my starting location s1 in environment 1 and learn how to decode that representation to an environment location, I cannot use the same representation when I start from s2 in environment 1, because everything will have shifted. I still believe using boundaries is the only solution to infer the absolute location for the agent here (because that's the only information that it gets), and that's the reason for finding boundary representations (and not grid cells). Imagine doing this task on a perfect torus where there are no boundaries: it would be impossible to ever find out at what 'absolute' location you are in the environment. I have therefore not updated this part of my review, but do let me know if I misunderstood.

      Thank you for addressing this point, which is a somewhat unusual feature of our network: We believe the point you raise applies if the decoding were fixed. However, in our case, the decoding is dynamic and depends on the firing pattern, as place unit centers are decoded on a per-trajectory basis. Thus, a new place-like basis may be formed for each trajectory (and in each environment). Hence, the model is not constrained to reuse its representation across trajectories or environments, as place centers are inferred based on unit firing. However, we do observe that the network learns to use a fixed place field placement in each geometry, which likely reflects some optimal solution to the decoding problem. This might also help to explain the hexagonal arrangement of learned field centers. Finally, we agree that egocentric may not be entirely accurate, but we found it to be the best word to distinguish from the allocentric-type navigation adopted by the network.

      Regarding noise injection:

      Beyond that noise level, the network might return to high correlations, but that must be due to the boundary interactions - very much like what happens at the very beginning of entering an environment: the network has learned to use the boundary to figure out where it is from an uninformative initial hidden state. But I don't think this is currently reflected well in the main text. That still reads "Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor." I think your new (very useful!) velocity ablations show that only small noise is compensated for by attractor dynamics, and larger noise injections are error corrected through boundary interactions. I've added this to the new review.

      Thank you for your kind feedback: We have changed the phrasing in the text to say “robust even to moderate perturbations. ” As we hold that, while numerically small, the amount of injected noise is rather large when compared to the magnitude of activities in the network (see Fig. A5d); the largest maximal rate is around 0.1, which is similar to the noise level at which output representations fail to re-converge. However, some moderation is appropriate, we agree.

      On contexts being attractive:

      In the new bit of text, I'm not sure why "each environment appears to correspond to distinct attractive states (as evidenced by the global-type remapping behavior)", i.e. why global-type remapping is evidence for attractive states. Again, to me global-type remapping is evidence that contexts occupy different parts of activity space, but not that they are attractive. I like the new analysis in Appendix F, as it demonstrates that the context signal determines which region of activity space is selected (as opposed to the boundary information!). If I'm not mistaken, we know three things: 1. Different contexts exist in different parts of representation space, 2. Representations are attractive for small amounts of noise, 3. The context signal determines which point in representation space is selected (thanks to the new analysis in Appendix F). That seems to be in line with what the paper claims (I think "contexts are attractive" has been removed?) so I've updated the review.

      It seems to us that we are in agreement on this point; our aim is simply to point out that a particular context signal appears to correspond to a particular (discrete) attractor state (i.e., occupying a distinct part of representation space, as you state), it just seems we use slightly different language, but to avoid confusion, we changed this to say that “representations are attractive”.

      Thanks again for engaging with us, this discussion has been very helpful in improving the paper.

      Reviewer #2:

      However, I still struggle to understand the entire picture of the boundary-to-place-to-grid model. After all, what is the role of grid cells in the proposed view? Are they just redundant representations of the space? I encourage the authors to clarify these points in the last two paragraphs on pages 17-18 of the discussion.

      Thank you for your feedback. While we have discussed the possible role of a grid code to some extent, we agree that this point requires clarification. We have therefore added to the discussion on the role of grid cells, which now reads “While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry \cite{xu_conformal_2022, schaeffer_self-supervised_2023, schoyen_hexagons_2024}, capacity, spatial separation and path invariance \cite{schaeffer_self-supervised_2023}. Another possibility is that grid cells are geared more towards other cognitive tasks, such as providing a neural metric for space \cite{ginosar_are_2023, pettersen_self-supervised_2024}, or supporting memory and inference-making \cite{whittington_tolman-eichenbaum_2020}. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.”

      Thank you again for your time and input.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a nonhymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      We agree that a functional proof of the PsimOR14 function using reverse genetics would be a valuable addition to the study to firmly establish its role in trail pheromone sensing. Nevertheless, such a functional proof is difficult to obtain. Due to the very slow ontogenetic development inherent to termites (several months from an egg to the worker stage) the CRISPR Cas-9 is not a useful technique for this taxon. By contrast, termites are quite responsive to RNAimediated silencing and RNAi has previously been used for the silencing of the ORCo co-receptor in termites resulting in impairment of the trail-following behavior (DOI: 10.1093/jee/toaa248). Likewise, our previous experiments showed a decreased ORCo transcript abundance, lower sensitivity to neocembrene and reduced neocembrene trail following upon dsPsimORCo administration to P. simplex workers, while we did not succeed in reducing the transcript abundance of PsimOR14 upon dsPsimOR14 injection. We do not report these negative results in the present manuscript so as not to dilute the main message. In parallel, we are currently developing an alternative way of dsRNA delivery using nanoparticle coating, which may improve the RNAi experiments with ORs in termites.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      We thank the reviewer for the overall positive evaluation of the manuscript.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

      We address point by point the specific comments listed in the Recommendation to the authors chapter below.

      Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      We understand the comment about the lack of an intelligible cue to highlight the motivation and importance of the present study. In the current version of the manuscript the introduction has been reworked. As suggested by Reviewer 3 in the Recommendations section below, the introduction now integrates some parts of the original discussion, especially the part discussing the OR evolution and emergence of eusociality in hymenopteran social insects and in termites, while underscoring the need of data from termites to compare the commonalities and idiosyncrasies in neurophysiological (pre)adaptations potentially linked with the independent eusociality evolution in the two main social insect clades.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      Indeed, we were extremely lucky. Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. The selection criteria for the first set of four receptors were (i) to have full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) to be represented on different branches (subbranches) of the phylogenetic tree. Then it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component. In the revised version, we state these selection criteria in the results section (Phylogenetic reconstruction and candidate OR selection).

      The deorphanization attempts of additional P. simplex ORs are currently running.

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      We agree that the classification of multiporous sensilla into five categories lacks robust discrimination cues. The identification of the neocembrene-responding sensillum was initially carried out by SSR measurements on individual olfactory sensilla of P. simplex workers one-by-one and the topology of each tested sensillum was recorded on optical microscope photographs taken during the SSR experiment. Subsequently, the SEM and HR-SEM were performed in which we localized the neocembrene sensillum and tried to find distinguishing characters. We admit that these are not robust. Therefore, in the revised version of the manuscript we decided to abandon the attempt of sensilla classification and only report the observations about the specific sensillum in which we consistently recorded the response to neocembrene (and geranylgeraniol). The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.

      The reviewer points at an important methodological error we made while designing the experiments. Indeed, the inclusion of long-chain hydrocarbons into Panel 1 without additional heat applied to the odor cartridges was inappropriate, even though the experiments were performed at 25–26 °C. We carefully considered the best solution to correct the mistake and finally decided to remove all tested ligands beyond C22 from Panel 1, i.e. altogether five compounds. These changes did not affect the remaining Panels 2-4 (containing compounds with sufficient volatility), nor did they affect the message of the manuscript on highly selective response of PsimOR14 to neocembrene (and geranylgeryniol). In consequence, Figures 2, 3 and 5 were updated, along with the supplementary tables containing the raw data on SSR measurements. In addition, the tuning curve for PsimOR14 was re-built and receptor lifetime sparseness value re-calculated (without any important change). We also exchanged squalene for limonene in the docking and molecular dynamics analysis and made new calculations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) L 208: "than" instead of "that"

      Corrected.

      (2) L 527+527 strange squares (•) before dimensions

      Apparently an error upon file conversion, corrected.

      (3) L553 "reconstructing" instead of "reconstruct"

      Corrected.

      (4) Two references (Chahda et al. and Chang et al. appear too late in the alphabet.

      Corrected. Thank you for spotting this mistake. Due to our mistake the author list was ordered according to the alphabet in Czech language, which ranks CH after H.

      Reviewer #2 (Recommendations for the authors):

      (1) L148: Why did the authors select only four ORs (PsimOR9, 14, 30, and 31) though there are 50 ORs in P. simplex? I would like you to explain why you chose them.

      Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. Then, it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component, while the deorphanization attempts of a set of additional P. simplex ORs is currently running. In the revised version of the manuscript, we state the selection criteria for the four ORs studied in the Results section (Phylogenetic reconstruction and candidate OR selection).

      (2) L149: Where is Figure 1A? Does this mean Figure 1?

      Thank you for spotting this mistake. Fig. 1 is now properly labelled as Fig. 1A and 1B in the figure itself and in the legend. Also the text now either refers to either 1A or 1B.

      (3) Figure 1: The authors also showed the transcription abundance of all 50 ORs of P. simplex in the right bottom of Figure 1, but there is no explanation about it in the main text.

      The heatmap reporting the transcript abundances is now labelled as Fig. 1B and is referred to in the discussion section (in the original manuscript it was referred to on the same place as Fig. 1).

      (4) L260-265: The authors confirmed higher expression of PsimOR14 in workers than soldiers by using RNA-seq data and stronger EAG responses of PsimOR14 to neocembrene in workers than soldiers, but I think that confirming the expression levels of PsimOR14 in workers and soldiers by RT-qPCR would strengthen the authors' argument (it is optional).

      qPCR validation is a suitable complement to read count comparison of RNA Seq data, especially when the data comes from one-sample transcriptomes and/or low coverage sequencing. Yet, our RNA Seq analysis is based on sequencing of three independent biological replicates per phenotype (worker heads vs. soldier heads) with ~20 millions of reads per sample. Thus, the resulting differential gene expression analysis is a sufficient and powerful technique in terms of detection limit and dynamic range.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified since the Methods section only referred to the GenBank accession numbers in the original manuscript. Therefore, we added more information in the Methods section (Bioinformatics) and make clear in the Methods that this data comes from our previous research and related bioproject.

      (5) L491: I think that "The synthetic processes of these fatty alcohols are ..." is better.

      We replaced the sentence with “The de novo organic synthesis of these fatty alcohols is described …”

      (6) L525 and 527: There are white squares between the number and the unit. Perhaps some characters have been garbled.

      Apparently an error upon file conversion, corrected.

      (7) L795: ORCo?

      Corrected.

      (8) L829-830 & Figure 4: Where is Figure 4D?

      Thank you for spotting this mistake from the older version of Figure 4. The SSR traces referred to in the legend are in fact a part of Figure 5. Moreover, Figure 4 is now reworked based on the comments by Reviewer 3.

      (9) L860-864: Why did the authors select the result of edgeR for the volcano plot in Figure 7 although the authors use both DESeq2 and edgeR? An explanation would be needed.

      Both algorithms, DESeq2 and EdgeR, are routinely used for differential gene expression analysis. Since they differ in read count normalization method and statistical testing we decided to use both of them independently in order to reduce false positives. Because the resulting fold changes were practically identical in both algorithms (results for both analyses are listed in Supplementary table S15), we only reported in Fig. 7 the outputs for edgeR to avoid redundancies. We added in the Results section the information that both techniques listed PsimOR14 among the most upregulated in workers.

      Reviewer #3 (Recommendations for the authors):

      The discussion contains many descriptions that would fit better into the introduction, where they could be used to hint at the study's importance (e.g., 292-311, 381-412). The remaining parts often lack a detailed discussion of the results that integrates details from other insect studies. Although references were provided, no details were usually outlined. It would be helpful to see a stronger emphasis on what we learn from this study.

      Along with rewriting the introduction, we also modified the discussion. As suggested, the lines 292-311 were rewritten and placed in the introduction. By contrast, we preferred to keep the two paragraphs 381-412 in the discussion, since both of them outline the potential future interesting targets of research on termite ORs.

      As suggested, the discussion has been enriched and now includes comparative examples and relevant references about the broad/narrow selectivity of insect ORs, about the expected breadth of tuning of pheromone receptors vs. ORs detecting environmental cues, about the potential role of additional neurons housed in the neocembrene-detecting sensillum of P. simplex workers, etc. From both introduction and discussion the redundant details on the chemistry of termite communication have been removed.

      This includes explanations of the advantages of the specific methodologies the authors used and how they helped solve the manuscript's problem. What does the phylogeny solve? Was it used to select the ORs tested? It would be helpful to discuss what the phylogeny shows in comparison to other well-studied OR phylogenies, like those from the social Hymenoptera.

      We understand the comment. In fact, our motivation to include the phylogenetic tree of termite ORs was essentially to demonstrate (i) the orthologous nature of OR diversity with few expansions on low taxonomic levels, and (ii) to demonstrate graphically the relationship among the four selected sequences. We do not attempt here for a comprehensive phylogenetic analysis, because it would be redundant given that we recently published a large OR phylogeny which includes all sequences used in the present manuscript and analysed them in the proper context of related (cockroaches) and unrelated insect taxa (Johny et al., 2023). This paper also discusses the termite phylogenetic pattern with those observed in other Insecta. This paper is repeatedly cited on appropriate places of the present manuscript and its main observations are provided in the Introduction section. Therefore, we feel that thorough discussion on termite phylogeny would be redundant in the present paper.

      The authors categorized the sensilla types. Potential problems in the categorization aside, it would be helpful to know if it is expected that you have sensilla specialized in perceiving one specific pheromone. What is known about sensilla in other insects?

      We understand. In the discussion of the revised version, we develop more about the features typical/expected for a pheromone receptor and the sensillum housing this receptor together with two other olfactory sensory neurons, including examples from other insects.

      As the manuscript currently stands, specialist readers with their respective background knowledge would find this study very interesting. In contrast, the general reader would probably fail to appreciate the importance of the results.

      We hope that the re-organized and simplified introduction may now be more intelligible even for non-specialist readers.

      (1) L35: Should "workers" be replaced with "worker antennae"?

      Corrected.

      (2) L62: Should "conservativeness" be replaced by "conservation"?

      Replaced with “parsimony”.

      (3) L129: How and why did the authors choose four candidate ORs? I could not find any information about this in the manuscript. I wondered why they did not pick the more highly expressed PsimOr20 and 26 (Figure 7).

      As already replied above in the Weaknesses section, we selected for the first deorphanization attempts only a modest set of four ORs, while an additional set is currently being tested. We also explained above the inclusion criteria, i.e. (i) full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) presence on different branches (subbranches) of the OR phylogeny. For these reasons, we did not primarily consider the expression patterns of different ORs. As for Fig. 7, it shows differential expression between soldiers and workers, which was not the primary guideline either and the data was obtained only after having the ORs tested by SSR. Yet, even though we had data on P. simplex ORs expression (Fig. 1B), we did not presume that pheromone receptors should be among the most expressed ORs, given the richness of chemical cues detected by worker termites and unlike, e.g., male moths, where ORs for sex pheromones are intuitively highly expressed.

      The strategy of OR selection is specified in the results section of the revised manuscript under “Phylogenetic reconstruction and candidate OR selection”.

      (4) 198 to 200: SI, II, and III look very similar. Additional measurements rather than qualitative descriptions are required to consider them distinct sensilla. The bending of SIII could be an artifact of preparation. I do not see how the authors could distinguish between SI and SII under the optical microscope for recordings. A detailed explanation is required.

      As we responded above in “Weaknesses” chapter, we admit that the sensilla classification is not intelligible. Therefore, we decided in the revised version to abandon the classification of sensilla types and only focus on the observations made on the neocembreneresponding sensillum. To recognize the specific sensillum, we used its topology on the last antennal segment. Because termite antennae are not densely populated with sensilla, it is relatively easy to distinguish individual sensilla based on their topology on the antenna, both in optical microscope and SEM photographs. The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      (5) 208: "Than" instead of "that"

      Corrected.

      (6) 280: I suggest replacing "demand" with "capabilities"

      Corrected.

      (7) 312: Why "nevertheless? It sounds as if the authors suggest that there is evidence that ORs are not important for communication. This should be reworded.

      We removed “Nevertheless” from the beginning of the sentence.

      (8) 321 to 323: This sentence sounds as if something is missing. I suggest rewriting it.

      This sentence simply says that empty neuron Drosophila is a good tool for termite OR deorphanization and that termite ORs work well Drosophila ORCo. We reworded the sentence.

      (9) 323: I suggest starting a new paragraph.

      Corrected.

      (10) 421: How many colonies were used for each of the analyses?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (11) 430: Did the termites originate from one or multiple colonies and did the authors sample from the Florida and Cuba population?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (12) 501: How was the termite antenna fixated? The authors refer to the Drosophila methods, but given the large antennal differences between these species, more specific information would be helpful.

      Understood. We added the following information into the Methods section under “Electrophysiology”: “The grounding electrode was carefully inserted into the clypeus and the antenna was fixed on a microscope slide using a glass electrode. To avoid the antennal movement, the microscope slide was covered with double-sided tape and the three distal antennal segments were attached to the slide.”

      (13)509: I want to confirm that the authors indicate that the outlet of the glass tube with the airstream and odorant is 4 cm away from the Drosophila or termite antenna. The distance seems to be very large.

      Thank you for spotting this obvious mistake. The 4 cm distance applies for the distance between the opening for Pasteur pipette insertion into the delivery tube, the outlet itself is situated approx. 1 cm from the antenna. This information is now corrected.

      (14) 510/527: It looks like all odor panels were equally applied onto the filter paper despite the difference in solvent (hexane and paraffin oil). How was the solvent difference addressed?

      In our study we combine two types of odorant panels. First, we test on all four studied receptors a panel containing several compounds relevant for termite chemical communication including the C12 unsaturated alcohols, the diterpene neocembrene, the sesquiterpene (3R,6E)-nerolidol and other compounds. These compounds are stored in the laboratory as hexane solutions to prevent the oxidation/polymerization and it is not advisable to transfer them to another solvent. In the second step we used three additional panels of frequently occurring insect semiochemicals, which are stored as paraffin oil solutions, so as to address the breadth of PsimOR14 tuning. We are aware that the evaporation dynamics differ between the two solvents but we did not have any suitable option how to solve this problem. We believe that the use of the two solvents does not compromise the general message on the receptor specificity. For each panel, the corresponding solvent is used as a control. Similarly, the use of two different solvents for SSR can be encountered in other studies, e.g. 10.1016/j.celrep.2015.07.031.

      (15) 518: delta spikes/sec works for all tables except for the wild type in Table S5. I could not figure out how the authors get to delta spikes/sec in that table.

      Thank you for your sharp eye. Due to our mistake, the values of Δ spikes per second reported in Table S5 for W1118 were erroneously calculated using the formula for 0.5 sec stimulation instead of 1 sec. We corrected this mistake which does not impact the results interpretation in Table S5 and Fig. 2.

      522: Did the workers and soldiers originate from different colonies or different populations?

      We now clearly describe in the Material and Methods section the origin of termites for different experiments. EAG measurements were made using individuals (workers, soldiers) from one Cuban colony.

      (16) Figure 6C/D: I suggest matching colors between the two figures. For example, instead of using an orange circle in C and a green coloration of the intracellular flap in D, I recommend using blue, which is not used for something else. In addition, the binding pocket could be separated better from anything else in a different color.

      We agree that the color match for the intracellular flap was missing. This figure is now reworked and the colors should have a better match and the binding region is better delineated.

      (17) Figure 7/Table S15: It is unclear where the transcriptome data originate and what they are based on. Are these antennal transcriptomes or head transcriptomes? Do these data come from previous data sets or data generated in this study? Figure 7 refers to heads, Table S15 to workers and soldiers, and the methods only refer to antennal extractions. This should be clarified in the text, the figure, and the table.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified and that the information that the RNASeq originated from samples of heads+antennae of workers and soldiers should be provided at appropriate places. Therefore, we added more information on replicates and origin of the data in the Methods section (Bioinformatics) and make clear that this data comes from our previous research and refer to the corresponding bioproject. Likewise, the Figure 7 legend and Table S15 heading have been updated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development, providing valuable and interesting findings on AD progression and the effect of early-life choline supplantation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important beyond the context of AD.

      Thank you for identifying several strengths.

      Weaknesses:

      (1) The multiple headings and subheadings, focusing on the experimental method rather than the narrative, reduce the readability.

      We have reduced the number of headings.

      (2) Quantification of NeuN and FosB in WT littermates is needed to demonstrate rescue of neuronal death and hyperexcitability by high choline supplementation and also to gain further insights into the adverse effect of low choline on the performance of WT mice in the behavioral test.

      We agree and have added WT data for the NeuN and ΔFosB analyses. These data are included in the text and figures. For NeuN, the Figure is Figure 6. For ΔFosB it is Figure 7. In brief, the high choline diet restored NeuN and ΔFosB to the levels of WT mice.

      Below is Figure 6 and its legend to show the revised presentation of data for NeuN. Afterwards is the revised figure showing data for ΔFosB. After that are the sections of the Results that have been revised.

      Author response image 1.

      Choline supplementation improved NeuN immunoreactivity (ir) in hilar cells in Tg2576 animals. A. Representative images of NeuN-ir staining in the anterior DG of Tg2576 animals. (1) A section from a Tg2576 mouse fed the low choline diet. The area surrounded by a box is expanded below. Red arrows point to NeuN-ir hilar cells. Mol=molecular layer, GCL=granule cell layer, HIL=hilus. Calibration for the top row, 100 µm; for the bottom row, 50 µm. (2) A section from a Tg2576 mouse fed the intermediate diet. Same calibrations as for 1. (3) A section from a Tg2576 mouse fed the high choline diet. Same calibrations as for 1. B. Quantification methods. Representative images demonstrate the thresholding criteria used to quantify NeuN-ir. (1) A NeuN-stained section. The area surrounded by the white box is expanded in the inset (arrow) to show 3 hilar cells. The 2 NeuN-ir cells above threshold are marked by blue arrows. The 1 NeuN-ir cell below threshold is marked by a green arrow. (2) After converting the image to grayscale, the cells above threshold were designated as red. The inset shows that the two cells that were marked by blue arrows are red while the cell below threshold is not. (3) An example of the threshold menu from ImageJ showing the way the threshold was set. Sliders (red circles) were used to move the threshold to the left or right of the histogram of intensity values. The final position of the slider (red arrow) was positioned at the onset of the steep rise of the histogram. C. NeuN-ir in Tg2576 and WT mice. Tg2576 mice had either the low, intermediate, or high choline diet in early life. WT mice were fed the standard diet (intermediate choline). (1) Tg2576 mice treated with the high choline diet had significantly more hilar NeuN-ir cells in the anterior DG compared to Tg2576 mice that had been fed the low choline or intermediate diet. The values for Tg2576 mice that received the high choline diet were not significantly different from WT mice, suggesting that the high choline diet restored NeuN-ir. (2) There was no effect of diet or genotype in the posterior DG, probably because the low choline and intermediate diet did not appear to lower hilar NeuN-ir.

      Author response image 2.

      Choline supplementation reduced ∆FosB expression in dorsal GCs of Tg2576 mice. A. Representative images of ∆FosB staining in GCL of Tg2576 animals from each treatment group. (1) A section from a low choline-treated mouse shows robust ∆FosB-ir in the GCL. Calibration, 100 µm. Sections from intermediate (2) and high choline (3)-treated mice. Same calibration as 1. B. Quantification methods. Representative images demonstrating the thresholding criteria established to quantify ∆FosB. (1) A ∆FosB -stained section shows strongly-stained cells (white arrows). (2) A strict thresholding criteria was used to make only the darkest stained cells red. C. Use of the strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice treated with the choline supplemented diet had significantly less ∆FosB-ir compared to the Tg2576 mice fed the low or intermediate diets. Tg2576 mice fed the high choline diet were not significantly different from WT mice, suggesting a rescue of ∆FosB-ir. (2) There were no significant differences in ∆FosB-ir in posterior sections. D. Methods are shown using a threshold that was less strict. (1) Some of the stained cells that were included are not as dark as those used for the strict threshold (white arrows). (2) All cells above the less conservative threshold are shown in red. E. Use of the less strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice that were fed the high choline diet had less ΔFosB-ir pixels than the mice that were fed the other diets. There were no differences from WT mice, suggesting restoration of ∆FosB-ir by choline enrichment in early life. (2) Posterior DG. There were no significant differences between Tg2576 mice fed the 3 diets or WT mice.

      Results, Section C1, starting on Line 691:

      “To ask if the improvement in NeuN after MCS in Tg256 restored NeuN to WT levels we used WT mice. For this analysis we used a one-way ANOVA with 4 groups: Low choline Tg2576, Intermediate Tg2576, High choline Tg2576, and Intermediate WT (Figure 5C). Tukey-Kramer multiple comparisons tests were used as the post hoc tests. The WT mice were fed the intermediate diet because it is the standard mouse chow, and this group was intended to reflect normal mice. The results showed a significant group difference for anterior DG (F(3,25)=9.20; p=0.0003; Figure 5C1) but not posterior DG (F(3,28)=0.867; p=0.450; Figure 5C2). Regarding the anterior DG, there were more NeuN-ir cells in high choline-treated mice than both low choline (p=0.046) and intermediate choline-treated Tg2576 mice (p=0.003). WT mice had more NeuN-ir cells than Tg2576 mice fed the low (p=0.011) or intermediate diet (p=0.003). Tg2576 mice that were fed the high choline diet were not significantly different from WT (p=0.827).”

      Results, Section C2, starting on Line 722:

      “There was strong expression of ∆FosB in Tg2576 GCs in mice fed the low choline diet (Figure 7A1). The high choline diet and intermediate diet appeared to show less GCL ΔFosB-ir (Figure 7A2-3). A two-way ANOVA was conducted with the experimental group (Tg2576 low choline diet, Tg2576 intermediate choline diet, Tg2576 high choline diet, WT intermediate choline diet) and location (anterior or posterior) as main factors. There was a significant effect of group (F(3,32)=13.80, p=<0.0001) and location (F(1,32)=8.69, p=0.006). Tukey-Kramer post-hoc tests showed that Tg2576 mice fed the low choline diet had significantly greater ΔFosB-ir than Tg2576 mice fed the high choline diet (p=0.0005) and WT mice (p=0.0007). Tg2576 mice fed the low and intermediate diets were not significantly different (p=0.275). Tg2576 mice fed the high choline diet were not significantly different from WT (p>0.999). There were no differences between groups for the posterior DG (all p>0.05).”

      “∆FosB quantification was repeated with a lower threshold to define ∆FosB-ir GCs (see Methods) and results were the same (Figure 7D). Two-way ANOVA showed a significant effect of group (F(3,32)=14.28, p< 0.0001) and location (F(1,32)=7.07, p=0.0122) for anterior DG but not posterior DG (Figure 7D). For anterior sections, Tukey-Kramer post hoc tests showed that low choline mice had greater ΔFosB-ir than high choline mice (p=0.0024) and WT mice (p=0.005) but not Tg2576 mice fed the intermediate diet (p=0.275); Figure 7D1). Mice fed the high choline diet were not significantly different from WT (p=0.993; Figure 7D1). These data suggest that high choline in the diet early in life can reduce neuronal activity of GCs in offspring later in life. In addition, low choline has an opposite effect, suggesting low choline in early life has adverse effects.”

      (3) Quantification of the discrimination ratio of the novel object and novel location tests can facilitate the comparison between the different genotypes and diets.

      We have added the discrimination index for novel object location to the paper. The data are in a new figure: Figure 3. In brief, the results for discrimination index are the same as the results done originally, based on the analysis of percent of time exploring the novel object.

      Below is the new Figure and legend, followed by the new text in the Results.

      Author response image 3.

      Novel object location results based on the discrimination index. A. Results are shown for the 3 months-old WT and Tg2576 mice based on the discrimination index. (1) Mice fed the low choline diet showed object location memory only in WT. (2) Mice fed the intermediate diet showed object location memory only in WT. (3) Mice fed the high choline diet showed memory both for WT and Tg2576 mice. Therefore, the high choline diet improved memory in Tg2576 mice. B. The results for the 6 months-old mice are shown. (1-2) There was no significant memory demonstrated by mice that were fed either the low or intermediate choline diet. (3) Mice fed a diet enriched in choline showed memory whether they were WT or Tg2576 mice. Therefore, choline enrichment improved memory in all mice.

      Results, Section B1, starting on line 536:

      “The discrimination indices are shown in Figure 3 and results led to the same conclusions as the analyses in Figure 2. For the 3 months-old mice (Figure 3A), the low choline group did not show the ability to perform the task for WT or Tg2576 mice. Thus, a two-way ANOVA showed no effect of genotype (F(1,74)=0.027, p=0.870) or task phase (F(1,74)=1.41, p=0.239). For the intermediate diet-treated mice, there was no effect of genotype (F(1,50)=0.3.52, p=0.067) but there was an effect of task phase (F(1,50)=8.33, p=0.006). WT mice showed a greater discrimination index during testing relative to training (p=0.019) but Tg2576 mice did not (p=0.664). Therefore, Tg2576 mice fed the intermediate diet were impaired. In contrast, high choline-treated mice performed well. There was a main effect of task phase (F(1,68)=39.61, p=<0.001) with WT (p<0.0001) and Tg2576 mice (p=0.0002) showing preference for the moved object in the test phase. Interestingly, there was a main effect of genotype (F(1,68)=4.50, p=0.038) because the discrimination index for WT training was significantly different from Tg2576 testing (p<0.0001) and Tg2576 training was significantly different from WT testing (p=0.0003).”

      “The discrimination indices of 6 months-old mice led to the same conclusions as the results in Figure 2. There was no evidence of discrimination in low choline-treated mice by two-way ANOVA (no effect of genotype, (F(1,42)=3.25, p=0.079; no effect of task phase, F(1,42)=0.278, p=0.601). The same was true of mice fed the intermediate diet (genotype, F(1,12)=1.44, p=0.253; task phase, F(1,12)=2.64, p=0.130). However, both WT and Tg2576 mice performed well after being fed the high choline diet (effect of task phase, (F(1,52)=58.75, p=0.0001, but not genotype (F(1,52)=1.197, p=0.279). Tukey-Kramer post-hoc tests showed that both WT (p<0.0001) and Tg2576 mice that had received the high choline diet (p=0.0005) had elevated discrimination indices for the test session.”

      (4) The longitudinal analyses enable the performance of multi-level correlations between the discrimination ratio in NOR and NOL, NeuN and Fos levels, multiple EEG parameters, and premature death. Such analysis can potentially identify biomarkers associated with AD progression. These can be interesting in different choline supplementation, but also in the standard choline diet.

      We agree and added correlations to the paper in a new figure (Figure 9). Below is Figure 9 and its legend. Afterwards is the new Results section.

      Author response image 4.

      Correlations between IIS, Behavior, and hilar NeuN-ir. A. IIS frequency over 24 hrs is plotted against the preference for the novel object in the test phase of NOL. A greater preference is reflected by a greater percentage of time exploring the novel object. (1) The mice fed the high choline diet (red) showed greater preference for the novel object when IIS were low. These data suggest IIS impaired object location memory in the high choline-treated mice. The low choline-treated mice had very weak preference and very few IIS, potentially explaining the lack of correlation in these mice. (2) There were no significant correlations for IIS and NOR. However, there were only 4 mice for the high choline group, which is a limitation. B. IIS frequency over 24 hrs is plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. (1) Hilar NeuN-ir is plotted against the preference for the novel object in the test phase of NOL. There were no significant correlations. (2) Hilar NeuN-ir was greater for mice that had better performance in NOR, both for the low choline (blue) and high choline (red) groups. These data support the idea that hilar cells contribute to object recognition (Kesner et al. 2015; Botterill et al. 2021; GoodSmith et al. 2022).

      Results, Section F, starting on Line 801:

      “F. Correlations between IIS and other measurements

      As shown in Figure 9A, IIS were correlated to behavioral performance in some conditions. For these correlations, only mice that were fed the low and high choline diets were included because mice that were fed the intermediate diet did not have sufficient EEG recordings in the same mouse where behavior was studied. IIS frequency over 24 hrs was plotted against the preference for the novel object in the test phase (Figure 9A). For NOL, IIS were significantly less frequent when behavior was the best, but only for the high choline-treated mice (Pearson’s r, p=0.022). In the low choline group, behavioral performance was poor regardless of IIS frequency (Pearson’s r, p=0.933; Figure 9A1). For NOR, there were no significant correlations (low choliNe, p=0.202; high choline, p=0.680) but few mice were tested in the high choline-treated mice (Figure 9B2).

      We also tested whether there were correlations between dorsal hilar NeuN-ir cell numbers and IIS frequency. In Figure 9B, IIS frequency over 24 hrs was plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. For NOL, there was no significant correlation (low choline, p=0.273; high choline, p=0.159; Figure 9B1). However, for NOR, there were more NeuN-ir hilar cells when the behavioral performance was strongest (low choline, p=0.024; high choline, p=0.016; Figure 9B2). These data support prior studies showing that hilar cells, especially mossy cells (the majority of hilar neurons), contribute to object recognition (Botterill et al. 2021; GoodSmith et al. 2022).”

      We also noted that all mice were not possible to include because they died or other reasons, such a a loss of the headset (Results, Section A, Lines 463-464): Some mice were not possible to include in all assays either because they died before reaching 6 months or for other reasons.

      Reviewer #2 (Public Review):

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      We thank the Reviewer for identifying several strengths.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

      We agree and we revised the paper to address the evaluations that were suggested.

      Reviewer #1 (Recommendations For The Authors):

      (1) A reference directing to genotyping of Tg2576 mice is missing.

      We apologize for the oversight and added that the mice were genotyped by the New York University Mouse Genotyping core facility.

      Methods, Section A, Lines 210-211: “Genotypes were determined by the New York University Mouse Genotyping Core facility using a protocol to detect APP695.”

      (2) Which software was used to track the mice in the behavioral tests?

      We manually reviewed videos. This has been clarified in the revised manuscript. Methods, Section B4, Lines 268-270: Videos of the training and testing sessions were analyzed manually. A subset of data was analyzed by two independent blinded investigators and they were in agreement.

      (3) Unexpectedly, a low choline diet in AD mice was associated with reduced frequency of interictal spikes yet increased mortality and spontaneous seizures. The authors attribute this to postictal suppression.

      We did not intend to suggest that postictal depression was the only cause. It was a suggestion for one of many potential explanations why seizures would influence IIS frequency. For postictal depression, we suggested that postictal depression could transiently reduce IIS. We have clarified the text so this is clear (Discussion, starting on Line 960):

      If mice were unhealthy, IIS might have been reduced due to impaired excitatory synaptic function. Another reason for reduced IIS is that the mice that had the low choline diet had seizures which interrupted REM sleep. Thus, seizures in Tg2576 mice typically started in sleep. Less REM sleep would reduce IIS because IIS occur primarily in REM. Also, seizures in the Tg2576 mice were followed by a depression of the EEG (postictal depression; Supplemental Figure 3) that would transiently reduce IIS. A different, radical explanation is that the intermediate diet promoted IIS rather than low choline reducing IIS. Instead of choline, a constituent of the intermediate diet may have promoted IIS.

      However, reduced spike frequency is already evident at 5 weeks of age, a time point with a low occurrence of premature death. A more comprehensive analysis of EEG background activity may provide additional information if the epileptic activity is indeed reduced at this age.

      We did not intend to suggest that premature death caused reduced spike frequency. We have clarified the paper accordingly. We agree that a more in-depth EEG analysis would be useful but is beyond the scope of the study.

      (4) Supplementary Fig. 3 depicts far more spikes / 24 h compared to Fig. 7B (at least 100 spikes/24h in Supplementary Fig. 3 and less than 10 spikes/24h in Fig. 7B).

      We would like to clarify that before and after a seizure the spike frequency is unusually high. Therefore, there are far more spikes than prior figures.

      We clarified this issue by adding to the Supplemental Figure more data. The additional data are from mice without a seizure, showing their spikes are low in frequency.

      All recordings lasted several days. We included the data from mice with a seizure on one of the days and mice without any seizures. For mice with a seizure, we graphed IIS frequency for the day before, the day of the seizure, and the day after. For mice without a seizure, IIS frequency is plotted for 3 consecutive days. When there was a seizure, the day before and after showed high numbers of spikes. When there was no seizure on any of the 3 days, spikes were infrequent on all days.

      The revised figure and legend are shown below. It is Supplemental Figure 4 in the revised submission.

      Author response image 5.

      IIS frequency before and after seizures. A. Representative EEG traces recorded from electrodes implanted in the skull over the left frontal cortex, right occipital cortex, left hippocampus (Hippo) and right hippocampus during a spontaneous seizure in a 5 months-old Tg2576 mouse. Arrows point to the start (green arrow) and end of the seizure (red arrow), and postictal depression (blue arrow). B. IIS frequency was quantified from continuous video-EEG for mice that had a spontaneous seizure during the recording period and mice that did not. IIS frequency is plotted for 3 consecutive days, starting with the day before the seizure (designated as day 1), and ending with the day after the seizure (day 3). A two-way RMANOVA was conducted with the day and group (mice with or without a seizure) as main factors. There was a significant effect of day (F(2,4)=46.95, p=0.002) and group (seizure vs no seizure; F(1,2)=46.01, p=0.021) and an interaction of factors (F(2,4)=46.68, p=0.002)..Tukey-Kramer post-hoc tests showed that mice with a seizure had significantly greater IIS frequencies than mice without a seizure for every day (day 1, p=0.0005; day 2, p=0.0001; day 3, p=0.0014). For mice with a seizure, IIS frequency was higher on the day of the seizure than the day before (p=0.037) or after (p=0.010). For mice without a seizure, there were no significant differences in IIS frequency for day 1, 2, or 3. These data are similar to prior work showing that from one day to the next mice without seizures have similar IIS frequencies (Kam et al., 2016).

      In the text, the revised section is in the Results, Section C, starting on Line 772:

      “At 5-6 months, IIS frequencies were not significantly different in the mice fed the different diets (all p>0.05), probably because IIS frequency becomes increasingly variable with age (Kam et al. 2016). One source of variability is seizures, because there was a sharp increase in IIS during the day before and after a seizure (Supplemental Figure 4). Another reason that the diets failed to show differences was that the IIS frequency generally declined at 5-6 months. This can be appreciated in Figure 8B and Supplemental Figure 6B. These data are consistent with prior studies of Tg2576 mice where IIS increased from 1 to 3 months but then waxed and waned afterwards (Kam et al., 2016).”

      (5) The data indicating the protective effect of high choline supplementation are valuable, yet some of the claims are not completely supported by the data, mainly as the analysis of littermate WT mice is not complete.

      We added WT data to show that the high choline diet restored cell loss and ΔFosB expression to WT levels. These data strengthen the argument that the high choline diet was valuable. See the response to Reviewer #1, Public Review Point #2.

      • Line 591: "The results suggest that choline enrichment protected hilar neurons from NeuN loss in Tg2576 mice." A comparison to NeuN expression in WT mice is needed to make this statement.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      • Line 623: "These data suggest that high choline in the diet early in life can reduce hyperexcitability of GCs in offspring later in life. In addition, low choline has an opposite effect, again suggesting this maternal diet has adverse effects." Also here, FosB quantification in WT mice is needed.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      (7) Was the effect of choline associated with reduced tauopathy or A levels?

      The mice have no detectable hyperphosphorylated tau. The mice do have intracellular A before 6 months. This is especially the case in hilar neurons, but GCs have little (Criscuolo et al., eNeuro, 2023). However, in neurons that have reduced NeuN, we found previously that antibodies generally do not work well. We think it is because the neurons become pyknotic (Duffy et al., 2015), a condition associated with oxidative stress which causes antigens like NeuN to change conformation due to phosphorylation. Therefore, we did not conduct a comparison of hilar neurons across the different diets.

      (8) Since the mice were tested at 3 months and 6 months, it would be interesting to see the behavioral difference per mouse and the correlation with EEG recording and immunohistological analyses.

      We agree that would be valuable and this has been added to the paper. Please see response to Reviewer #1, Public Review Point #4.

      Reviewer #2 (Recommendations For The Authors):

      There were several areas that could be further improved, particularly in the areas of data analysis (particularly with images and supplemental figures), figure presentation, and mechanistic speculation.

      Major points:

      (1) It is understandable that, for the sake of labor and expense, WT mice were not implanted with EEG electrodes, particularly since previous work showed that WT mice have no IIS (Kam et al. 2016). However, from a standpoint of full factorial experimental design, there are several flaws - purists would argue are fatal flaws. First, the lack of WT groups creates underpowered and imbalanced groups, constraining statistical comparisons and likely reducing the significance of the results. Also, it is an assumption that diet does not influence IIS in WT mice. Secondly, with a within-subject experimental design (as described in Fig. 1A), 6-month-old mice are not naïve if they have previously been tested at 3 months. Such an experimental design may reduce effect size compared to non-naïve mice. These caveats should be included in the Discussion. It is likely that these caveats reduce effect size and that the actual statistical significance, were the experimental design perfect, would be higher overall.

      We agree and have added these points to the Limitations section of the Discussion. Starting on Line 1050: In addition, groups were not exactly matched. Although WT mice do not have IIS, a WT group for each of the Tg2576 groups would have been useful. Instead, we included WT mice for the behavioral tasks and some of the anatomical assays. Related to this point is that several mice died during the long-term EEG monitoring of IIS.

      (2) Since behavior, EEG, NeuN and FosB experiments seem to be done on every Tg2576 animal, it seems that there are missed opportunities to correlate behavior/EEG and histology on a per-mouse basis. For example, rather than speculate in the discussion, why not (for example) directly examine relationships between IIS/24 hours and FosB expression?

      We addressed this point above in responding to Reviewer #1, Public Review Point #4.

      (3) Methods of image quantification should be improved. Background subtraction should be considered in the analysis workflow (see Fig. 5C and Fig. 6C background). It would be helpful to have a Methods figure illustrating intermediate processing steps for both NeuN and FosB expression.

      We added more information to improve the methods of quantification. We did use a background subtraction approach where ImageJ provides a histogram of intensity values, and it determines when there is a sharp rise in staining relative to background. That point is where we set threshold. We think it is a procedure that has the least subjectivity.

      We added these methods to the Methods section and expanded the first figure about image quantification, Figure 6B. That figure and legend are shown above in response to Reviewer #1, Point #2.

      This is the revised section of the Methods, Section C3, starting on Line 345:

      “Photomicrographs were acquired using ImagePro Plus V7.0 (Media Cybernetics) and a digital camera (Model RET 2000R-F-CLR-12, Q-Imaging). NeuN and ∆FosB staining were quantified from micrographs using ImageJ (V1.44, National Institutes of Health). All images were first converted to grayscale and in each section, the hilus was traced, defined by zone 4 of Amaral (1978). A threshold was then calculated to identify the NeuN-stained cell bodies but not background. Then NeuN-stained cell bodies in the hilus were quantified manually. Note that the threshold was defined in ImageJ using the distribution of intensities in the micrograph. A threshold was then set using a slider in the histogram provided by Image J. The slider was pushed from the low level of staining (similar to background) to the location where staining intensity made a sharp rise, reflecting stained cells. Cells with labeling that was above threshold were counted.”

      (4) This reviewer is surprised that the authors do not speculate more about ACh-related mechanisms. For example, choline deficiency would likely reduce Ach release, which could have the same effect on IIS as muscarinic antagonism (Kam et al. 2016), and could potentially explain the paradoxical effects of a low choline diet on reducing IIS. Some additional mechanistic speculation would be helpful in the Discussion.

      We thank the Reviewer for noting this so we could add it to the Discussion. We had not because we were concerned about space limitations.

      The Discussion has a new section starting on Line 1009:

      “Choline and cholinergic neurons

      There are many suggestions for the mechanisms that allow MCS to improve health of the offspring. One hypothesis that we are interested in is that MCS improves outcomes by reducing IIS. Reducing IIS would potentially reduce hyperactivity, which is significant because hyperactivity can increase release of A. IIS would also be likely to disrupt sleep since it represents aberrant synchronous activity over widespread brain regions. The disruption to sleep could impair memory consolidation, since it is a notable function of sleep (Graves et al. 2001; Poe et al. 2010). Sleep disruption also has other negative consequences such as impairing normal clearance of A (Nedergaard and Goldman 2020). In patients, IIS and similar events, IEDs, are correlated with memory impairment (Vossel et al. 2016).

      How would choline supplementation in early life reduce IIS of the offspring? It may do so by making BFCNs more resilient. That is significant because BFCN abnormalities appear to cause IIS. Thus, the cholinergic antagonist atropine reduced IIS in vivo in Tg2576 mice. Selective silencing of BFCNs reduced IIS also. Atropine also reduced elevated synaptic activity of GCs in young Tg2576 mice in vitro. These studies are consistent with the idea that early in AD there is elevated cholinergic activity (DeKosky et al. 2002; Ikonomovic et al. 2003; Kelley et al. 2014; Mufson et al. 2015; Kelley et al. 2016), while later in life there is degeneration. Indeed, the chronic overactivity could cause the degeneration.

      Why would MCS make BFCNs resilient? There are several possibilities that have been explored, based on genes upregulated by MCS. One attractive hypothesis is that neurotrophic support for BFCNs is retained after MCS but in aging and AD it declines (Gautier et al. 2023). The neurotrophins, notably nerve growth factor (NGF) and brain-derived neurotrophic factor (BDNF) support the health of BFCNs (Mufson et al. 2003; Niewiadomska et al. 2011).”

      Minor points:

      (1) The vendor is Dyets Inc., not Dyets.

      Thank you. This correction has been made.

      (2) Anesthesia chamber not specified (make, model, company).

      We have added this information to the Methods, Section D1, starting on Line 375: The animals were anesthetized by isoflurane inhalation (3% isoflurane. 2% oxygen for induction) in a rectangular transparent plexiglas chamber (18 cm long x 10 cm wide x 8 cm high) made in-house.

      (3) It is not clear whether software was used for the detection of behavior. Was position tracking software used or did blind observers individually score metrics?

      We have added the information to the paper. Please see the response to Reviewer #1, Recommendations for Authors, Point #2.

      (4) It is not clear why rat cages and not a true Open Field Maze were used for NOL and NOR.

      We used mouse cages because in our experience that is what is ideal to detect impairments in Tg2576 mice at young ages. We think it is why we have been so successful in identifying NOL impairments in young mice. Before our work, most investigators thought behavior only became impaired later. We would like to add that, in our experience, an Open Field Maze is not the most common cage that is used.

      (5) Figure 1A is not mentioned.

      It had been mentioned in the Introduction. Figure B-D was the first Figure mentioned in the Results so that is why it might have been missed. We now have added it to the first section of the Results, Line 457, so it is easier to find.

      6) Although Fig 7 results are somewhat complicated compared to Fig. 5 and 6 results, EEG comes chronologically earlier than NeuN and FosB expression experiments.

      We have kept the order as is because as the Reviewer said, the EEG is complex. For readability, we have kept the EEG results last.

      (7) Though the statistical analysis involved parametric and nonparametric tests, It is not clear which normality tests were used.

      We have added the name of the normality tests in the Methods, Section E, Line 443: Tests for normality (Shapiro-Wilk) and homogeneity of variance (Bartlett’s test) were used to determine if parametric statistics could be used. We also added after this sentence clarification: When data were not normal, non-parametric data were used. When there was significant heteroscedasticity of variance, data were log transformed. If log transformation did not resolve the heteroscedasticity, non-parametric statistics were used. Because we added correlations and analysis of survival curves, we also added the following (starting on Line 451): For correlations, Pearson’s r was calculated. To compare survival curves, a Log rank (Mantel-Cox) test was performed.

      Figures:

      (1) In Fig. 1A, Anatomy should be placed above the line.

      We changed the figure so that the word “Anatomy” is now aligned, and the arrow that was angled is no longer needed.

      In Fig. 1C and 1D, the objects seem to be moved into the cage, not the mice. This schematic does not accurately reflect the Fig. 1C and 1D figure legend text.

      Thank you for the excellent point. The figure has been revised. We also updated it to show the objects more accurately.

      Please correct the punctuation in the Fig. 1D legend.

      Thank you for mentioning the errors. We corrected the legend.

      For ease of understanding, Fig. 1C and 1D should have training and testing labeled in the figure.

      Thank you for the suggestion. We have revised the figure as suggested.

      Author response image 6.

      (2) In Figure 2, error bars for population stats (bar graphs) are not obvious or missing. Same for Figure 3.

      We added two supplemental figures to show error bars, because adding the error bars to the existing figures made the symbols, colors, connecting lines and error bars hard to distinguish. For novel object location (Fig. 2) the error bars are shown in Supp. Fig. 2. For novel object recognition, the error bars are shown in Supplemental Fig. 3.

      (3) The authors should consider a Methods figure for quantification of NeuN and deltaFOSB (expansions of Fig. 5C and Fig. 6C).

      Please see Reviewer #1, Public Review Point #2.

      (4) In Figure 5, A should be omitted and mentioned in the Methods/figure legend. B should be enlarged. C should be inset, zoomed-in images of the hilus, with an accompanying analysis image showing a clear reduction in NeuN intensity in low choline conditions compared to intermediate and high choline conditions. In D, X axes could delineate conditions (figure legend and color unnecessary). Figure 5C should be moved to a Methods figure.

      We thank the review for the excellent suggestions. We removed A as suggested. We expanded B and included insets. We used different images to show a more obvious reduction of cells for the low choline group. We expanded the Methods schematics. The revised figure is Figure 6 and shown above in response to Reviewer 1, Public Review Point #2.

      (5) In Figure 6, A should be eliminated and mentioned in the Methods/figure legend. B should be greatly expanded with higher and lower thresholds shown on subsequent panels (3x3 design).

      We removed A as suggested. We expanded B as suggested. The higher and lower thresholds are shown in C. The revised figure is Figure 7 and shown above in response to Reviewer 1, Public Review Point #2.

      (6) In Figure 7, A2 should be expanded vertically. A3 should be expanded both vertically and horizontally. B 1 and 2 should be increased, particularly B1 where it is difficult to see symbols. Perhaps colored symbols offset/staggered per group so that the spread per group is clearer.

      We added a panel (A4) to show an expansion of A2 and A3. However, we did not see that a vertical expansion would add information so we opted not to add that. We expanded B1 as suggested but opted not to expand B2 because we did not think it would enhance clarity. The revised figure is below.

      Author response image 7.

      (7) Supplemental Figure 1 could possibly be combined with Figure 1 (use rounded corner rat cage schematic for continuity).

      We opted not to combine figures because it would make one extremely large figure. As a result, the parts of the figure would be small and difficult to see.

      (8) Supplemental Figure 2 - there does not seem to be any statistical analysis associated with A mentioned in the Results text.

      We added the statistical information. It is now Supplemental Figure 4:

      Author response image 8.

      Mortality was high in mice treated with the low choline diet. A. Survival curves are shown for mice fed the low choline diet and mice fed the high choline diet. The mice fed the high choline diet had a significantly less severe survival curve. B. Left: A photo of a mouse after sudden unexplained death. The mouse was found in a posture consistent with death during a convulsive seizure. The area surrounded by the red box is expanded below to show the outstretched hindlimb (red arrow). Right: A photo of a mouse that did not die suddenly. The area surrounded by the box is expanded below to show that the hindlimb is not outstretched.

      The revised text is in the Results, Section E, starting on Line 793:

      “The reason that low choline-treated mice appeared to die in a seizure was that they were found in a specific posture in their cage which occurs when a severe seizure leads to death (Supplemental Figure 5). They were found in a prone posture with extended, rigid limbs (Supplemental Figure 5). Regardless of how the mice died, there was greater mortality in the low choline group compared to mice that had been fed the high choline diet (Log-rank (Mantel-Cox) test, Chi square 5.36, df 1, p=0.021; Supplemental Figure 5A).”

      Also, why isn't intermediate choline also shown?

      We do not have the data from the animals. Records of death were not kept, regrettably.

      Perhaps labeling of male/female could also be done as part of this graph.

      We agree this would be very interesting but do not have all sex information.

      B is not very convincing, though it is understandable once one reads about posture.

      We have clarified the text and figure, as well as the legend. They are above.

      Are there additional animals that were seen to be in a specific posture?

      There are many examples, and we added them to hopefully make it more convincing.

      We also added posture in WT mice when there is a death to show how different it is.

      Is there any relationship between seizures detected via EEG, as shown in Supplemental Figure 3, and death?

      Several mice died during a convulsive seizure, which is the type of seizure that is shown in the Supplemental Figure.

      (9) Supplemental Figure 3 seems to display an isolated case in which EEG-detected seizures correlate with increased IIEs. It is not clear whether there are additional documented cases of seizures that could be assembled into a meaningful population graph. If this data does not exist or is too much work to include in this manuscript, perhaps it can be saved for a future paper.

      We have added other cases and revised the graph. This is now Supplemental Figure 4 and is shown above in response to Reviewer #1, Recommendation for Authors Point #4.

      Frontal is misspelled.

      We checked and our copy is not showing a misspelling. However, we are very grateful to the Reviewer for catching many errors and reading the manuscript carefully.

      (10) Supplemental Figure 4 seems incomplete in that it does not include EEG data from months 4, 5, and 6 (see Fig. 7B).

      We have added data for these ages to the Supplemental Figure (currently Supplemental Figure 6) as part B. In part A, which had been the original figure, only 1.2, 2, and 3 months-old mice were shown because there were insufficient numbers of each sex at other ages. However, by pooling 1.2 and 2 months (Supplemental Figure 6B1), 3 and 4 months (B2) and 5 and 6 months (B3) we could do the analysis of sex. The results are the same – we detected no sex differences.

      Author response image 9.

      A. IIS frequency was similar for each sex. A. IIS frequency was compared for females and males at 1.2 months (1), 2 months (2), and 3 months (3). Two-way ANOVA was used to analyze the effects of sex and diet. Female and male Tg2576 mice were not significantly different. B. Mice were pooled at 1.2 and 2 months (1), 3 and 4 months (2) and 5 and 6 months (3). Two-way ANOVA analyzed the effects of sex and diet. There were significant effects of diet for (1) and (2) but not (3). There were no effects of sex at any age. (1) There were significant effects of diet (F(2,47)=46.21, p<0.0001) but not sex (F(1,47)=0.106, p=0.746). Female and male mice fed the low choline diet or high choline diet were significantly different from female and male mice fed the intermediate diet (all p<0.05, asterisk). (2) There were significant effects of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Both female and male mice of the low choline group were significantly different from male mice fed the intermediate diet (both p<0.05, asterisk) but no other pairwise comparisons were significant. (3) There were no significant differences (diet, F(2,23)=1.21, p=0.317); sex, F(1,23)=0.844, p=0.368).

      The data are discussed the Results, Section G, tarting on Line 843:

      In Supplemental Figure 6B we grouped mice at 1-2 months, 3-4 months and 5-6 months so that there were sufficient females and males to compare each diet. A two-way ANOVA with diet and sex as factors showed a significant effect of diet (F(2,47)=46.21; p<0.0001) at 1-2 months of age, but not sex (F1,47)=0.11, p=0.758). Post-hoc comparisons showed that the low choline group had fewer IIS than the intermediate group, and the same was true for the high choline-treated mice. Thus, female mice fed the low choline diet differed from the females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Male mice that had received the low choline diet different from females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Female mice fed the high choline diet different from females (p=0.002) and males (p<0.0001) fed the intermediate diet, and males fed the high choline diet difference from females (p<0.0001) and males (p<0.0001) fed the intermediate diet.

      For the 3-4 months-old mice there was also a significant effect of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Post-hoc tests showed that low choline females were different from males fed the intermediate diet (p=0.007), and low choline males were also significantly different from males that had received the intermediate diet (p=0.006). There were no significant effects of diet (F(2,23)=1.21, p=0.317) or sex (F(1,23)=0.84, p=0.368) at 5-6 months of age.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to illuminate how legumes promote symbiosis with beneficial nitrogen-fixing bacteria while maintaining a general defensive posture towards the plethora of potentially pathogenic bacteria in their environment. Intriguingly, a protein involved in plant defence signalling, RIN4, is implicated as a type of 'gatekeeper' for symbiosis, connecting symbiosis signalling with defence signalling. Although questions remain about how exactly RIN4 enables symbiosis, the work opens an important door to new discoveries in this area.

      Strengths:

      The study uses a multidisciplinary, state-of-the-art approach to implicate RIN4 in soybean nodulation and symbiosis development. The results support the authors' conclusions.

      Weaknesses:

      No serious weaknesses, although the manuscript could be improved slightly from technical and communication standpoints.

      Reviewer #2 (Public Review):

      Summary:

      The study by Toth et al. investigates the role of RIN4, a key immune regulator, in the symbiotic nitrogen fixation process between soybean and rhizobium. The authors found that SymRK can interact with and phosphorylate GmRIN4. This phosphorylation occurs within a 15 amino acid motif that is highly conserved in Nfixation clades. Genetic studies indicate that GmRIN4a/b play a role in root nodule symbiosis. Based on their data, the authors suggest that RIN4 may function as a key regulator connecting symbiotic and immune signaling pathways.

      Overall, the conclusions of this paper are well supported by the data, although there are a few areas that need clarification.

      Strengths:

      This study provides important insights by demonstrating that RIN4, a key immune regulator, is also required for symbiotic nitrogen fixation.

      The findings suggest that GmRIN4a/b could mediate appropriate responses during infection, whether it is by friendly or hostile organisms.

      Weaknesses:

      The study did not explore the immune response in the rin4 mutant. Therefore, it remains unknown how GmRIN4a/b distinguishes between friend and foe.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Toth et al reveals a conserved phosphorylation site within the RIN4 (RPM1-interacting protein 4) R protein that is exclusive to two of the four nodulating clades, Fabales and Rosales. The authors present persuasive genetic and biochemical evidence that phosphorylation at the serine residue 143 of GmRIN4b, located within a 15-aa conserved motif with a core five amino acids 'GRDSP' region, by SymRK, is essential for optimal nodulation in soybean. While the experimental design and results are robust, the manuscript's discussion fails to clearly articulate the significance of these findings. Results described here are important to understand how the symbiosis signaling pathway prioritizes associations with beneficial rhizobia, while repressing immunity-related signals.

      Strengths:

      The manuscript asks an important question in plant-microbe interaction studies with interesting findings.

      Overall, the experiments are detailed, thorough, and very well-designed. The findings appear to be robust.

      The authors provide results that are not overinterpreted and are instead measured and logical.

      Weaknesses:

      No major weaknesses. However, a well-thought-out discussion integrating all the findings and interpreting them is lacking; in its current form, the discussion lacks 'boldness'. The primary question of the study - how plants differentiate between pathogens and symbionts - is not discussed in light of the findings. The concluding remark, "Taken together, our results indicate that successful development of the root nodule symbiosis requires cross-talk between NF-triggered symbiotic signaling and plant immune signaling mediated by RIN4," though accurate, fails to capture the novelty or significance of the findings, and left me wondering how this adds to what is already known. A clear conclusion, for eg, the phosphorylation of RIN4 isoforms by SYMRK at S143 modulates immune responses during symbiotic interactions with rhizobia, or similar, is needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticism of the work, although it could be improved by addressing the following minor points:

      (1) Page 8, Figure 2 legend. Consider changing "proper symbiosis formation" to "normal nodulation" or something that better reflects control of nodule development/number.

      We thank you for the suggestion, the legend was changed to “...required for normal nodule formation” (see Page 10, revised manuscript)

      (2) Page 9. Cut "newly" from the first sentence of paragraph 2, as S143 phosphorylation was identified previously.

      Thank you for the suggestion, we removed “newly” from the sentence.

      (3) Page 10, Figure 3. Panels B showing green-fluorescent nodules are unnecessary given the quantitative data presented in the accompanying panel A. This goes for similar supplemental figures later.

      We appreciate the comment; regarding Figure 3 (complementing rin4b mutant, we updated the figures according to the other reviewer’s comment) and Suppl Figure 6 (OE phenotype of phospho-mimic/negative mutants), we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See Page 10, revised manuscript)

      (4) Consider swapping Figure 3 for Supplemental Figure S7, which I think shows more clearly the importance of RIN4 phosphorylation in nodulation.

      We appreciate the comment and have swapped the figures according to the reviewer’s suggestion. Legend, figure description, and manuscript text have been updated accordingly. (See page 12 and 38, revised manuscript)

      (5) Page 10. Replace "it will be referred to S143..." with "we refer to S143 instead of ....".

      We replaced it according to the comment.

      (6) Page 11, delete "While" from "While no interactions could be observed...".

      We deleted it according to the suggestion.

      (7) Page 33, Fig S5. How many biological replicates were performed to produce the data presented in panel C and what do the error bar and asterisk indicate? Check that this information is provided in all figures that show errors and statistical significance.

      Thank you for the remark. The experiment was repeated three times, and this note was added to the figure description. All the other figure legends with error bar(s) were checked whether replicates are indicated accordingly.

      (8) Page 37, Fig S11, panel B. Are averages of data from the 2 biological and 3 technical replicates shown? Add error bars and tests of significant difference.

      Averages of a total of 6 replicates (from 2 biological replicates, each run in triplicates) are shown. We thank the reviewer for pointing out the missing error bars and statistical test, we have updated the figure accordingly.

      (9) Fig S12. Why are panels A, C, E, and G presented? The other panels seem to show the same data more clearly- showing the linear relationship between peak area ratio and protein concentration.

      We have taken the reviewer’s comment into consideration and revised the figure, removing the calibration curves and showing only four panels. The figure legend has been corrected accordingly. (Please see page 43, revised masnuscript). The original figure (unlike other revised figures) had to be deleted from the revised manuscript,as it caused technical issues when converting the document into pdf.

      Reviewer #2 (Recommendations For The Authors):

      Some small suggestions:

      (1) It's good to include a protein schematic for RIN4 in Figure 1.

      We appreciate the reviewer’s suggestion and we have drawn a protein schematic and added it to Figure 1. The figure legend was updated accordingly.

      (2) There appears to be incorrect labeling in Figure 2c; please double-check and make the necessary corrections.

      With respect, we do not understand the comment about incorrect labeling. Would the reviewer please help us out and give more explanation? In Figure 2C, RIN4a and RIN4b expression was checked in transgenic roots expressing either EV (empty vector) or different silencing constructs targeting RIN4a/b.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed the level of detail and precision in experimental design.

      A discussion point could be - What does it mean that nodule number but not fixation is affected? Is RIN4 only involved in the entry stage of infection but not in nodules during N-fixation?

      Current/Our data suggest that RIN4 does indeed appear to be involved in infection. This hypothesis is supported by the findings that RIN4a/b was found phosphorylated in root hairs but not in root (or it was not detected in the root). The interaction with the early signaling RLKs also suggests that RIN4 is likely involved in the early stage of symbiosis formation.

      How would the authors explain their observation "However, the motif is retained in non-nodulating Fabales (such as C. canadensis, N. schottii; SI Appendix, Figure S2) and Rosales species as well." What does this imply about the role in symbiosis that the authors propose?

      We appreciate the reviewer’s question. The motif seems to be retained, however, it might be not only the motif but also the protein structure that in case of nodulating plants might be different. We have not investigated the structure of RIN4, how it would look based on certain features/upon interaction with another protein and/or post-translational modification(s). Griesman et al, (2018) showed the absence of certain genes within Fabales in non-nodulating species, we can speculate that these absent genes can’t interact with RIN4 in those species, therefore the lack of downstream signaling could be possible (in spite of the retained motif in non-nodulating species). At this point, there is not enough data or knowledge to further speculate.

      qPCR analysis of symbiotic pathway genes showed that both NIN-dependent and NIN-independent branches of the symbiosis signaling pathway were negatively affected in the rin4b mutant. Please derive a conclusion from this.

      We appreciate the comment, it also prompted us to correct the following sentence; original: “Since NIN is responsible for induction of NF-YA and ERN1 transcription factors, their reduced expression in rin4b plants was not unexpected (Fig. 5). “As ERN1 expression is independent of NIN (Kawaharada et al, 2017). The following sentences were also deleted as it represented a repetition of a statement above these sentences: “Soybean NF-YA1 homolog responded significantly to rhizobial treatment in rin4b plants, whereas NF-YA3 induction did not show significant induction (Fig. 5).“

      We added the following conclusion/hypothesis: “Based on the results of the expression data presented above, it seems that both NIN-dependent and NINindependent branches of the symbiotic signaling pathways are affected in the rin4b mutant background. This indicates that the role of RIN4 protein in the symbiotic pathway can be placed upstream of CYCLOPS, as the CYCLOPS transcription activating complex is responsible (directly or indirectly) for the activation of all TFs tested in our expression analysis (Singh et al, 2014/47, 48).” (Please see Page 16, revised manuscript)

      The authors are highly encouraged to write a thoughtful discussion that would accompany the detailed experimental work performed in this manuscript.

      We appreciate the comment, and we did some work on the discussion part of the document. (Please see Pages 17-19, revised manuscript)

      Some minor suggestions for overall readability are below.

      What about immune signaling genes? Given that authors hypothesize that "Absence of AtRIN4 leads to increased PTI responses and, therefore, it might be that GmRIN4b absence also causes enhanced PTI which might have contributed to significantly fewer nodules." Could check marker immune signaling gene expression FLS2 and others.

      We appreciate the reviewer’s comment, and while we believe those are very interesting questions/suggestions, answering them is out of the scope of the current manuscript. Partially because it has been shown that several defenseresponsive genes that were described in leaf immune responses could not be confirmed to respond in a similar manner in root (Chuberre et al., 2018). It was also shown that plant immune responses are compartmentalized and specialized in roots (Chuberre et al., 2018). If we were looking at immune-responsive genes, the signal might be diluted because of its specialized and compartmentalized nature. Another reason why these questions cannot be answered as a part of the current manuscript is because finding a suitable immune responsive gene would require rigorous experiments (not only in root, but also in root hair (over a timecourse) which would be a ground work for a separate study (root hair isolation is not a trivial experiment, it requires at least 250-300 seedlings per treatment/per time-point).

      Regarding FLS2, it is known in Arabidopsis that its expression is tissue-specific within the root, and it seems that FLS2 expression is restricted to the root vasculature (Wyrsch et al, 2015). In our manuscript, we showed that RIN4a/b is highly expressed in root hairs, as well as RIN4 phosphorylation was detectable in root hair but not in the root; therefore, we do not see the reason to investigate FLS2 expression.

      "in our hands only ERN1a could be amplified. One possible explanation for this observation is that primers were designed based on Williams 82 reference genome, while our rin4b mutant was generated in the Bert cultivar background." Is the sequence between the two cultivars and the primers that bind to ERN1b in both cultivars so different? If not, this explanation is not very convincing.

      At the time of performing the experiment the genomic sequence of the Bert cultivar (used for generating rin4b edited lines) was not publicly available. In accordance with the reviewer’s comment, we removed the explanation, as it does not seem to be relevant. (See page 16, revised manuscript)

      The figures are clear and there is a logical flow. The images of fluorescing nodules in Figure 2,3 panels with nodules are not informative or unbiased .

      We appreciate the comment, as for Figure 3 (complementing rin4b mutant), we updated the figures according to the other reviewer’s comment and Suppl. Figure 6 (OE phenotype of phospho-mimic/negative mutants) we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See pages 10, 12 and 38, revised manuscript)

      What does the exercise in isolation of rin4 mutants in lotus tell us? Is it worth including?

      Isolation of the Ljrin4 mutant suggests that RIN4 carries such an importance that the mutant version of it is lethal for the plant (as in Arabidospis, where most of the evidence regarding the role of RIN4 has been described), and an additional piece of evidence that RIN4 is similarly crucial across most land plant species.

      Sentence ambiguous. "Co-expression of RIN4a and b with SymRKßΔMLD and NFR1α _resulted in YFP fluorescence detected by Confocal Laser Scanning Microscopy (SI Appendix, Figure S8) suggesting that RIN4a and b proteins closely associate with both RLKs." Were all 4 expressed together?

      Thank you for the remark. Not all 4 proteins were co-expressed together. We adjusted the sentence as follows: “Co-expression of RIN4a/ and b with SymRKßΔMLD as well as and NFR1α resulted in YFP fluorescence…” I hope it is phrased in a clearer way. (See page 13, revised manuscript)

      Minor spelling errors throughout.. Costume-made (custom made?)

      Thank you for noticing. According to the Cambridge online dictionary, it is written with a hyphen, therefore, we added a hyphen and corrected the manuscript accordingly.

      CRISPR-cas9 or CRISPR/Cas9? Keep it consistent throughout. CRISPR-cas9 is the latest consensus.

      We corrected it to “CRISPR-Cas9” throughout the manuscript.

      References are missing for several 'obvious statements' but please include them to reach a broader audience. For example the first 5 sentences of the introduction. Also, statements such as 'Root hairs are the primary entry point for rhizobial infection in most legumes.'.

      Thank you for the comment. To make it clearer, we also added reference #1, after the third sentence of the introduction, as well as we added an additional review as reference. This additional review was also cited as the source for the sentence “Root hairs are the primary…” (Please see page 2, revised manuscript)

      Can you provide a percent value? Silencing of RIN4a and RIN4b resulted in significantly reduced nodule numbers on soybean transgenic roots in comparison to transgenic roots carrying the empty vector control. Also, this wording suggests it was a double K.D. but from the images, it appears they were individually silenced.

      We appreciate the reviewer's comment. We observed a 50-70% reduction in the number of nodules. We adjusted the text according to the reviewer's remark. (See page 9, revised manuscript)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      This manuscript reports preliminary evidence of successful optogenetic activation of single retinal ganglion cells (RGCs) through the eye of a living monkey using adaptive optics (AO).

      Strengths

      The eventual goals of this line of research have enormous potential impact in that they will probe the perceptual impact of activating single RGCs. While I think more data should be included, the four examples shown look quite convincing. Weaknesses

      While this is undoubtedly a technical achievement and an important step along this group's stated goal to measure the perceptual consequences of single-RGC activations, the presentation lacks the rigor that I would expect from what is really a methods paper. In my view, it is perfectly reasonable to publish the details of a method before it has yielded any new biological insights, but in those publications, there is a higher burden to report the methodological details, full data sets, calibrations, and limitations of the method. There is considerable room for improvement in reporting those aspects. Specifically, more raw data should be shown for activations of neighboring RGCs to pinpoint the actual resolution of the technique, and more than two cells (one from each field of view) should be tested.

      We have expanded sections discussing both the methodology and limitations of this technique via a rewrite of the results and discussion section. The data used in the paper is available online via the link provided in the manuscript. We agree that a more detailed investigation of the strengths and limitations of the approach would have been a laudable goal. However, before returning to more detailed studies, we have shifted our effort to developing the monkey psychophysical performance we need to combine with the single cell stimulation approach described here. In addition, the optogenetic ChrimsonR used in this study is not the best choice for this experiment because of its poor sensitivity. We are currently exploring the use of ChRmine (as described in lines 93-97), which is roughly 2 orders of magnitude more sensitive. We have also been working on methods to improve probe stabilization to reduce tracking errors during eye movements. Once these improvements have been implemented, we will undertake the more detailed studies suggested here. Nonetheless, as a pragmatic matter, we submit that it is valuable to document proof-of-concept with this manuscript.

      Some information about the density of labeled RGCs in these animals would also be helpful to provide context for how many well-isolated target cells exist per animal.

      We agree. Getting reliable information about labeled cell density would be difficult without detailed histology of the retina, which we are reluctant to do because it would require sacrificing these precious and expensive monkeys from which we continue to get valuable information. We are actively exploring methods to reduce the cell density to make isolation easier including the use of the CAMKII promoter as well as the use of intracranial injections via AAV.retro that would allow calcium indicator expression in the peripheral retina where RGCs form a monolayer. It may be that the rarity of isolated RGCS will not be a fundamental limitation of the approach in the future.

      Reviewer #2 (Public Review):

      This proof-of-principle study lays important groundwork for future studies. Murphy et al. expressed ChrimsonR and GCaMP6s in retinal ganglion cells of a living macaque. They recorded calcium responses and stimulated individual cells, optically. Neurons targeted for stimulation were activated strongly whereas neighboring neurons were not.

      The ability to record from neuronal populations while simultaneously stimulating a subset in a controlled way is a high priority for systems neuroscience, and this has been particularly challenging in primates. This study marks an important milestone in the journey towards this goal.

      The ability to detect stimulation of single RGCs was presumably due to the smallness of the light spot and the sparsity of transduction. Can the authors comment on the importance of the latter factor for their results? Is it possible that the stimulation protocol activated neurons nearby the targeted neuron that did not express GCaMP? Is it possible that off-target neurons near the targeted neuron expressed GCaMP, and were activated, but too weakly to produce a detectable GCaMP signal? In general, simply knowing that off-target signals were undetectable is not enough; knowing something about the threshold for the detection of off-target signals under the conditions of this experiment is critical.

      We agree with these points. We cannot rule out the possibility that some nearby cells were activated but we could not detect this because they did not express GCaMP. We also do not know whether cells responded but our recording methods were not sufficiently sensitive to detect them. A related limitation is that we do not know of course what the relationship is between the threshold for detection with calcium imaging and what the psychophysical detection threshold would have been an awake behaving monkey. Nonetheless, the data show that we can produce a much larger response in the target cell than in nearby cells whose response we can measure, and we suggest that that is a valuable contribution even if we can’t argue that the isolation is absolute. We’ve acknowledged these important limitations in the revised manuscript in lines 66-77.

      Minor comments:

      Did the lights used to stimulate and record from the retina excite RGCs via the normal lightsensing pathway? Were any such responses recorded? What was their magnitude?

      The recording light does activate the normal light-sensing pathway to some extent, although it does not fall upon the RGC receptive fields directly. There was a 30 second adaptation period at the beginning of each trial to minimize the impact of this on the recording of optogeneticallymediated responses, as described in lines 222-224. The optogenetic probe does not appear to significantly excite the cone pathway, and we do not see the expected off-target excitations that would result from this.

      The data presented attest to a lack of crosstalk between targeted and neighboring cells. It is therefore surprising that lines 69-72 are dedicated to methods for "reducing the crosstalk problem". More information should be provided regarding the magnitude of this problem under the current protocol/instrumentation and the techniques that were used to circumvent it to obtain the data presented.

      The “crosstalk problem” referred to in this quote refers to crosstalk caused by targeting cells at higher eccentricities that are more densely packed, which are not represented in the data. The data presented is limited to the more isolated central RGCs.

      Optical crosstalk could be spatial or spectral. Laying out this distinction plainly could help the reader understand the issues quickly. The Methods indicate that cells were chosen on the basis that they were > 20 µm from their nearest (well-labeled) neighbor to mitigate optical crosstalk, but the following sentence is about spectral overlap.

      We have added a clearer explanation of what precisely we mean by crosstalk in lines 213-221.

      Figure 2 legend: "...even the nearby cell somas do not show significantly elevated response (p >> 0.05, unpaired t-test) than other cells at more distant locations." This sentence does not indicate how some cells were classified as "nearby" whereas others were classified as being "at more distant locations". Perhaps a linear regression would be more appropriate than an unpaired t-test here.

      The distinction here between “nearby” and “more distant” is 50 µm. We have clarified this in the figure caption. Performing a linear regression on cell response over distance shows a slight downward trend in two of the four cells shown here, but this trend does not reach the threshold of significance.

      Line 56: "These recordings were... acquired earlier in the session where no stimulus was present." More information should be provided regarding the conditions under which this baseline was obtained. I assume that the ChrimsonR-activating light was off and the 488 nmGCaMP excitation light was on, but this was not stated explicitly. Were any other lights on (e.g. room lights or cone-imaging lights)? If there was no spatial component to the baseline measurement, "where" should be "when".

      Your assumptions are correct. There was no spatial component to the baseline measurement, and these measurements are explained in more detail in lines 240-243.

      Please add a scalebar to Figure 1a to facilitate comparison with Figure 2.

      This has been done.

      Lines 165-173: Was the 488 nm light static or 10 Hz-modulated? The text indicates that GCaMP was excited with a 488 nm light and data were acquired using a scanning light ophthalmoscope, but line 198 says that "the 488 nm imaging light provides a static stimulus".

      The 488nm is effectively modulated at 25 Hz by the scanning action of the system. I believe the 10 Hz modulated you speak of is the closed-loop correction rate of the adaptive optics. The text has been updated in lines 217-219 to clarify this.

      A potential application of this technology is for the study of visually guided behavior in awake macaques. This is an exciting prospect. With that in mind, a useful contribution of this report would be a frank discussion of the hurdles that remain for such application (in addition to eye movements, which are already discussed).

      Lines 109-130 now offer an expanded discussion of this topic.

      Reviewer #3 (Public Review):

      This paper reports a considerable technical achievement: the optogenetic activation of single retinal ganglion cells in vivo in monkeys. As clearly specified in the paper, this is an important step towards causal tests of the role of specific ganglion cell types in visual perception. Yet this methodological advance is not described currently in sufficient detail to replicate or evaluate. The paper could be improved substantially by including additional methodological details. Some specific suggestions follow.

      The start of the results needs a paragraph or more to outline how you got to Figure 1. Figure 1 itself lacks scale bars, and it is unclear, for example, that the ganglion cells targeted are in the foveal slope.

      The results have been rewritten with additional explanation of methodology and the location of the RGCs has been clarified.

      The text mentions the potential difficulties targeting ganglion cells at larger eccentricities where the soma density increases. If this is something that you have tried it would be nice to include some of that data (whether or not selective activation was possible). Related to this point, it would be helpful to include a summary of the ganglion cell density in monkey retina.

      This is not something we tried, as we knew that the axial resolution allowed by the monkey’s eye would result in an axial PSF too large to only hit a single cell. The overall ganglion cell density is less relevant than the density of cells expressing ChrimsonR/GCaMP, which we only have limited info about without detailed histology.

      Related to the point in the previous paragraph - do you have any experiments in which you systematically moved the stimulation spot away from the target ganglion cell to directly test the dependence of stimulation on distance? This would be a valuable addition to the paper.

      We agree that this would have been a valuable addition to the paper, but we are reluctant to do them now. We are implementing an improved method to track the eye and a better optogenetic agent in an entirely new instrument, and we think that future experiments along these lines would be best done when those changes are completed.

      The activity in Figure 1 recovers from activation very slowly - much more slowly than the light response of these cells, and much more slowly than the activity elicited in most optogenetic studies. Can you quantify this time course and comment on why it might be so slow?

      We attribute the slow recovery to the calcium dynamics of the cell, and this slow recovery time is consistent with calcium responses seen in our lab elicited via the cone pathway. Similar time courses can be seen in Yin (2013) for RGCs excited via their cone inputs.

      Traces from non-targeted cells should be shown in Figure 1 along with those of targeted cells.

      We have added this as part of Figure 2.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Although we have no further revisions on the manuscript, we would like to respond to the remaining comments from the reviewers as follows.

      Reviewer 1:

      The authors have addressed some concerns raised in the initial review but some remain. In particular it is still unclear what conclusions can be drawn about taskrelated activity from scans that are performed 30 minutes after the behavioral task. I continue to think that a reorganization/analysis data according to event type would be useful and easier to interpret across the two brain areas, but the authors did not choose to do this. Finally, switching the cue-response association, I am convinced, would help to strengthen this study.

      As for the task-related activity, the strategy for PET scan was explained in our response to the comment 2 from Reviewer 2. Briefly, rats receive intravenous administration of 18F-FDG solution before the start of the behavioral session. The 18FFDG uptake into the cells starts immediately and reaches the maximum level until 30 min, being kept at least for 1 h. A 30-min PET scan is executed 25 min after the session. Therefore, the brain activity reflects the metabolic state during task performance in rats.

      Regarding data presentation of the electrophysiological experiments, we described the subpopulations of event-related neurons showing notable neuronal activity patterns in the order of aDLS and pVLS, according to the procedure of explanations for the behavioral study

      For switching the cue-response association, we mentioned the difference in firing activity between HR and LL trials, suggesting that different combinations between the stimulus and response may affect the level of firing activity. As suggested by the reviewer, an examination of switching the cue-response association is useful to confirm our interpretation. We will address this issue in our future studies.

      Reviewer 2:

      The authors have made important revisions to the manuscript and it has improved in clarity. They also added several figures in the rebuttal letter to answer questions by the reviewers. I would ask that these figures are also made public as part of the authors' response or if not, included in the manuscript.

      We will present the figures publicly available as part of our response.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e., 13B onto 13A, or among each other, i.e., 13As onto other 13As, and/or onto leg motoneurons, i.e., 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories, with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to a few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly affect leg grooming. As well as activating or silencing subpopulations, i.e., 3 to 6 elements of the 13A and 13B groups, has marked effects on leg grooming, including frequency and joint positions, and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e., feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e., grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects the generation of the motor behavior, thereby exemplifying their important role in generating grooming.

      We thank the reviewer for their thoughtful and constructive evaluation of our work. We are encouraged by their recognition of the major contributions of our study, including the identification of multiple inhibitory circuit motifs and their contribution to organizing rhythmic leg grooming behavior. We also appreciate the reviewer’s comments highlighting our use of connectomics, targeted manipulations, and modeling to reveal how distinct subsets of inhibitory interneurons contribute to motor behavior.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow for differentiation between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so, open loop experiments, e.g., in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      We appreciate the reviewer’s point regarding the role of sensory feedback in our experimental design. We agree that reafferent (sensory) input from ongoing movements could contribute to the behavioral outcomes of our optogenetic manipulations. However, our aim was not to isolate central versus peripheral contributions, but rather to assess the role of 13A/B neurons within the intact, operational sensorimotor system during natural grooming behavior.

      These inhibitory neurons form recurrent loops, synapse onto motor neurons, and receive proprioceptive input—placing them in a position to both shape central motor output and process sensory feedback. As such, manipulating their activity engages both central control and sensory consequences.

      The finding that silencing 13A neurons in dusted flies disrupts rhythmic leg coordination highlights their role in organizing grooming movements. Prior studies (e.g., Ravbar et al., 2021) show that grooming rhythms persist when sensory input is reduced, indicating a central origin, while sensory feedback refines timing, coordination, and long-timescale stability. We concluded that rhythmicity arises centrally but is shaped and stabilized by mechanosensory or proprioceptive feedback. Our current results are consistent with this view and support a model in which inhibitory premotor neurons participate in a closed-loop control architecture that generates and tunes rhythmic output.

      While we agree that fully removing sensory feedback and parsing distinct roles for neurons that participate in multiple circuit motifs would be desirable, we do not see a plausible experimental path to accomplish this - we would welcome suggestions!

      We considered the method used by Mendes and Mann (eLife 2023) to assess sensory feedback to walking, 5-40-GAL4, DacRE-flp, UAS->stop>TNT + 13A/B-spGAL4 X UAS-csChrimson. This would require converting one targeting system to LexA and presents significant technical challenges. More importantly, we believe the core interpretation issue would remain: broadly silencing proprioceptors would produce pleiotropic effects and impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input.

      We will clarify in the revised manuscript that our behavioral experiments were performed in freely moving flies under closed-loop conditions. We thank the reviewer for highlighting these important considerations and will revise the manuscript to better communicate the scope and interpretation of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      We thank the reviewer for their thoughtful and encouraging evaluation of our work. We are especially grateful for their recognition of our detailed connectome analysis and its contribution to understanding the organization of premotor inhibitory circuits. We appreciate the reviewer’s comments highlighting the integration of connectomics with optogenetic perturbations to functionally interrogate the 13A and 13B circuits, as well as their recognition of our modeling approach as a valuable framework for linking circuit architecture to behavior.

      Weaknesses:

      (1) In Figure 4, while the authors report statistically significant shifts in both proximal inter-leg distance and movement frequency across conditions, the distributions largely overlap, and only in Panel K (13B silencing) is there a noticeable deviation from the expected 7-8 Hz grooming frequency. Could the authors clarify whether these changes truly reflect disruption of the grooming rhythm?

      We are re-analyzing the whole dataset in the light of the reviews (specifically, we are now applying LMM to these statistics). For the panels in question (H-J), there is indeed a large overlap between the frequency distributions, but the box plots show median and quartiles, which partially overlap. (In the current analysis, as it stands, differences in means were small yet significant.) However, there is a noticeable (not yet quantified) difference in variability between the frequencies (the experimental group being the more variable one). If the activations/deactivations of 13A/B circuits disrupt the rhythm, we would indeed expect the frequencies to become more variable. So, in the revised version we will quantify the differences in both the means and the variabilities, and establish whether either shows significance after applying the LMM.

      More importantly, all this data would make the most sense if it were performed in undusted flies (with controls) as is done in the next figure.

      In our assay conditions, undusted flies groom infrequently. We used undusted flies for some optogenetic activation experiments, where the neuron activation triggers behavior initiation, but we chose to analyze the effect of silencing inhibitory neurons in dusted flies because dust reliably activates mechanosensory neurons and elicits robust grooming behavior, enabling us to assess how manipulation of 13A/B neurons alters grooming rhythmicity and leg coordination.

      (2) In Figure 4-Figure Supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (3) For broader lines targeting six or more 13A neurons, the authors provide specific predictions about expected behavioral effects-e.g., that activation should bias the limb toward flexion and silencing should bias toward extension based on connectivity to motor neurons. Yet, when using the more restricted line labeling only two 13A neurons (Figure 4 - Figure Supplement 2), no such prediction is made. The authors report disrupted grooming but do not specify whether the disruption is expected to bias the movement toward flexion or extension, nor do they discuss the muscle target. This is a missed opportunity to apply the same level of mechanistic reasoning that was used for broader manipulations.

      While we know which two neurons are labeled based on confocal expression, assigning their exact identity in the EM datasets has been challenging. One of these neurons appears absent from our 13A reconstructions of the right T1 neuropil in FANC, although we did locate it in MANC. However, its annotation in MANC has undergone multiple revisions, making confident assignment difficult at this time. Since we can’t be sure which motor neurons and muscles are most directly connected, we did not want to predict this line’s effect on leg movements.

      (4) Regarding Figure 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing that the authors get the behavior! It would still be important for the authors to mention the optogenetics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also surprised - and intrigued - by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We tried several different activation paradigms: pulsed from 8Hz to 500Hz and with various on/off intervals. Because several of these different stimulation protocols resulted in grooming, and with different rhythmic frequencies, we think the phenotypes are a specific property of the neural circuits we have activated, rather than the kinetics of CsChrimson itself.

      We will include the data from other frequencies in a new Supplementary Figure, we will discuss the caveats CsChrimson’s slow off-kinetics present to precise temporal control of neural activity, and we will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Thank you!

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study, in its current form, makes an important but overclaimed contribution to the literature due to a mismatch between the claims in the paper and the data presented.

      Strengths:

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      (1) They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      (2) They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      (3) They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      We appreciate the reviewer’s thorough and constructive feedback on our work. We are encouraged by their recognition of the complementary approaches used in our study.

      Weaknesses:

      The manuscript aims to reveal an instructive, rhythm-generating role for premotor inhibition in coordinating the multi-joint leg synergies underlying grooming. It makes a valuable contribution, but currently, the main claims in the paper are not well-supported by the presented evidence.

      Major points

      (1) Starting with the title of this manuscript, "Inhibitory circuits generate rhythms for leg movements during Drosophila grooming", the authors raise the expectation that they will show that the 13A and 13B hemilineages produce rhythmic output that underlies grooming. This manuscript does not show that. For instance, to test how they drive the rhythmic leg movements that underlie grooming requires the authors to test whether these neurons produce the rhythmic output underlying behavior in the absence of rhythmic input. Because the optogenetic pulses used for stimulation were rhythmic, the authors cannot make this point, and the modelling uses a "black box" excitatory network, the output of which might be rhythmic (this is not shown). Therefore, the evidence (behavioral entrainment; perturbation effects; computational model) is all indirect, meaning that the paper's claim that "inhibitory circuits generate rhythms" rests on inferred sufficiency. A direct recording (e.g., calcium imaging or patch-clamp) from 13A/13B during grooming - outside the scope of the study - would be needed to show intrinsic rhythmogenesis. The conclusions drawn from the data should therefore be tempered. Moreover, the "black box" needs to be opened. What output does it produce? How exactly is it connected to the 13A-13B circuit?

      We will modify the title to better reflect our strongest conclusions: “Inhibitory circuits coordinate rhythmic leg movements during Drosophila grooming”

      Our optogenetic activation was delivered in a patterned (70 ms on/off) fashion that entrains rhythmic movements but does not rule out the possibility that the rhythm is imposed externally. In the manuscript, we state that we used pulsed light to mimic a flexion-extension cycle and note that this approach tests whether inhibition is sufficient to drive rhythmic leg movements when temporally patterned. While this does not prove that 13A/13B neurons are intrinsic rhythm generators, it does demonstrate that activating subsets of inhibitory neurons is sufficient to elicit alternating leg movements resembling natural grooming and walking.

      Our goal with the model was to demonstrate that it is possible to produce rhythmic outputs with this 13A/B circuit, based on the connectome. The “black box” is a small recurrent neural network (RNN) consisting of 40 neurons in its hidden layer. The inputs are the “dust” levels from the environment (the green pixels in Figure 6I), the “proprioceptive” inputs (“efference copy” from motor neurons), and the amount of dust accumulated on both legs. The outputs (all positive) connect to the 13A neurons, the 13B neurons, and to the motor neurons. We refer to it as the “black box” because we make no claims about the actual excitatory inputs to these circuits. Its function is to provide input, needed to run the network, that reflects the distribution of “dust” in the environment as well as the information about the position of the legs.

      The output of the “black box” component of the model might be rhythmic. In fact, in most instances of the model implementation this is indeed the case. However, as mentioned in the current version of the manuscript: “But the 13A circuitry can still produce rhythmic behavior even without those external sensory inputs (or when set to a constant value), although the legs become less coordinated.” Indeed, when we refine the model (with the evolutionary training) without the “black box” (using a constant input of 0.1) the behavior is still rhythmic and sustained. Therefore, the rhythmic activity and behavior can emerge from the premotor circuitry itself without a rhythmic input.

      The context in which the 13A and 13B hemilineages sit also needs to be explained. What do we know about the other inputs to the motorneurons studied? What excitatory circuits are there?

      We agree that there are many more excitatory and inhibitory, direct and indirect, connections to motor neurons that will also affect leg movements for grooming and walking. Our goal was to demonstrate what is possible from a constrained circuit of inhibitory neurons that we mapped in detail, and we hope to add additional components to better replicate the biological circuit as behavioral and biomechanical data is obtained by us and others. We will add this clarification of the limits of the scope to the Discussion.

      Furthermore, the introduction ignores many decades of work in other species on the role of inhibitory cell types in motor systems. There is some mention of this in the discussion, but even previous work in Drosophila larvae is not mentioned, nor crustacean STG, nor any other cell types previously studied. This manuscript makes a valuable contribution, but it is not the first to study inhibition in motor systems, and this should be made clear to the reader.

      We thank the reviewer for this important reminder and we will expand our discussion of the relevant history and context in our revision. Previous work on the contribution of inhibitory neurons to invertebrate motor control certainly influenced our research and we should acknowledge this better.

      (2) The experimental evidence is not always presented convincingly, at times lacking data, quantification, explanation, appropriate rationales, or sufficient interpretation.

      We are committed to improving the clarity, rationale, and completeness of our experimental descriptions. We will revisit the statistical tests applied throughout the manuscript and expand the Methods.

      (3) The statistics used are unlike any I remember having seen, essentially one big t-test followed by correction for multiple comparisons. I wonder whether this approach is optimal for these nested, high‐dimensional behavioral data. For instance, the authors do not report any formal test of normality. This might be an issue given the often skewed distributions of kinematic variables that are reported. Moreover, each fly contributes many video segments, and each segment results in multiple measurements. By treating every segment as an independent observation, the non‐independence of measurements within the same animal is ignored. I think a linear mixed‐effects model (LMM) or generalized linear mixed model (GLMM) might be more appropriate.

      We thank the reviewer for raising this important point regarding the statistical treatment of our segmented behavioral data. Our initial analysis used independent t-tests with Bonferroni correction across behavioral classes and features, which allowed us to identify broad effects. However, we acknowledge that this approach does not account for the nested structure of the data. To address this, we will re-analyze key comparisons using linear mixed-effects models (LMMs) as suggested by the reviewer. This approach will allow us to more appropriately model within-fly variability and test the robustness of our conclusions. We will update the manuscript based on the outcomes of these analyses.

      (4) The manuscript mentions that legs are used for walking as well as grooming. While this is welcome, the authors then do not discuss the implications of this in sufficient detail. For instance, how should we interpret that pulsed stimulation of a subset of 13A neurons produces grooming and walking behaviours? How does neural control of grooming interact with that of walking?

      We do not know how the inhibitory neurons we investigated will affect walking or how circuits for control of grooming and walking might compete. We speculate that overlapping pre-motor circuits may participate in walking and grooming because both behaviors have extension flexion cycles at similar frequencies, but we do not have hard experimental data to support. This would be an interesting area for future research. Here, we focused on the consequences of activating specific 13A/B neurons during grooming because they were identified through a behavioral screen for grooming disruptions, and we had developed high-resolution assays and familiarity with the normal movements in this behavior. We will clarify this rationale in the revised discussion.

      (5) The manuscript needs to be proofread and edited as there are inconsistencies in labelling in figures, phrasing errors, missing citations of figures in the text, or citations that are not in the correct order, and referencing errors (examples: 81 and 83 are identical; 94 is missing in text).

      We will carefully proofread the manuscript to fix all figure labeling, citation order, and referencing errors.

    1. Brandt he made a statement that yesterday's breakthrough is today's graduate seminar is tomorrow's off-the-shelf home entertainment would that that were true but in fact yesterday's breakthrough is the thing that is most often forgotten it's not today's graduate seminar and it's definitely not tomorrow's home entertainment because what usually happens in the the grand tradition of Hollywood producers sitting around a table looking at a script and saying hey we've got ideas too too often people it's not a question of not invented here it's a question if I want to invent this myself and the if you look at the micro computers that are being sold today you'll see hardly anything that approximates the kind of total system design that the link had I think we all can realize that they are not sold as complete packages they're not sold as Honda's most of the things that you need can't even be plugged into them there are millions of wires to worry about and so forth but what is that may not be too surprising because after all that was a garage culture perhaps less forgivable

      we're not doing any better a computer out of the box doesn't do anything today without a whole set of payed software and subscriptions..

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Work by Brosseau et. al. combines NMR, biochemical assays, and MD simulations to characterize the influence of the C-terminal tail of EmrE, a model multi-drug efflux pump, on proton leak. The authors compare the WT pump to a C-terminal tail deletion, delta_107, finding that the mutant has increased proton leak in proteoliposome assays, shifted pH dependence with a new titratable residue, faster-alternating access at high pH values, and reduced growth, consistent with proton leak of the PMF.

      Strengths:

      The work combines thorough experimental analysis of structural, dynamic, and electrochemical properties of the mutant relative to WT proteins. The computational work is well aligned in vision and analysis. Although all questions are not answered, the authors lay out a logical exploration of the possible explanations.

      Weaknesses:

      There are a few analyses that are missing and important data left out. For example, the relative rate of drug efflux of the mutant should be reported to justify the focus on proton leak. Additionally, the correlation between structural interactions should be directly analyzed and the mutant PMF also analyzed to justify the claims based on hydration alone. Some aspects of the increased dynamics at high pH due to a potential salt bridge are not clear.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores the role of the C-terminal tail of EmrE in controlling uncoupled proton flux. Leakage occurs in the wild-type transporter under certain conditions but is amplified in the C-terminal truncation mutant D107. The authors use an impressive combination of growth assays, transport assays, NMR on WT and mutants with and without key substrates, classical MD, and reactive MD to address this problem. Overall, I think that the claims are well supported by the data, but I am most concerned about the reproducibility of the MD data, initial structures used for simulations, and the stochasticity of the water wire formation. These can all be addressed in a revision with more simulations as I point out below. I want to point out that the discussion was very nicely written, and I enjoyed reading the summary of the data and the connection to other studies very much.

      Strengths:

      The Henzler-Wildman lab is at the forefront of using quantitative experiments to probe the peculiarities in transporter biophysics, and the MD work from the Voth lab complements the experiments quite well. The sheer number of different types of experimental and computational approaches performed here is impressive.

      Weaknesses:

      The primary weaknesses are related to the reproducibility of the MD results with regard to the formation of water wires in the WT and truncation mutant. This could be resolved with simulations starting from structures built using very different loops and C-terminal tails.

      The water wire gates identified in the MD should be tested experimentally with site-directed mutagenesis to determine if those residues do impact leak.

      We appreciate the reviewers thoughtful consideration of our manuscript, and their recognition of the variety of experimental and computational approaches we have brought to bear in probing the very challenging question of uncoupled proton leak through EmrE.

      We did record SSME measurements with MeTPP+, a small molecule substrate at two different protein:lipid ratios. These experiments report the rate of net flux when both proton-coupled substrate antiport and substrate-gated proton leak are possible. We will add this data to the revision, including data acquired with different lipid:protein ratio that confirms we are detecting transport rather than binding. In brief, this data shows that the net flux is highly dependent on both proton concentration (pH) and drug-substrate concentration, as predicted by our mechanistic model. This demonstrates that both types of transport contribute to net flux when small molecule substrates are present.

      In the absence of drug-substrate, proton leak is the only possible transport pathway. The pyranine assay directly assesses proton leak under these conditions and unambiguously shows faster proton entry into proteoliposomes through the ∆107-EmrE mutant than through WT EmrE, with the rate of proton entry into ∆107-EmrE proteoliposomes matching the rate of proton entry achieved by the protonophore CCCP. We have revised the text to more clearly emphasize how this directly measures proton leak independently of any other type of transport activity. The SSME experiments with a proton gradient only (no small molecule substrate present) provide additional data on shorter timescales that is consistent with the pyranine data. The consistency of the data across multiple LPRs and comparison of transport to proton leak in the SSME assays further strengthens the importance of the C-terminal tail in determining the rate of flux.

      None of the current structural models have good resolution (crystallography, EM) or sufficient restraints (NMR) to define the loop and tail conformations sufficiently for comparison with this work. We are in the process of refining an experimental structure of EmrE with better resolution of the loop and tail regions implicated in proton-entry and leak. Direct assessment of structural interactions via mutagenesis is complicated because of the antiparallel homodimer structure of EmrE. Any point mutation necessarily affects both subunits of the dimer, and mutations designed to probe the hydrophobic gate on the more open face of the transporter also have the potential to disrupt closure on the opposite face, particularly in the absence of sufficient resolution in the available structures. Thus, mutagenesis to test specific predicted structural features is deferred until our structure is complete so that we can appropriately interpret the results.

      In our simulation setup, the MD results can be considered representative and meaningful for two reasons. First, the C-terminal tail, not present in the prior structure and thus modeled by us, is only 4 residues long. We will show in the revision and detailed response that the system will lose memory of its previous conformation very quickly, such that velocity initialization alone is enough for a diverse starting point. Second, our simulation is more like simulated annealing, starting from a high free energy state to show that, given such random initialization, the tail conformation we get in the end is consistent with what we reported. It is also difficult to sample back-and-forth tail motion within a realistic MD timescale. Therefore, it can be unconclusive to causally infer the allosteric motions with unbiased MD of the wildtype alone. The best viable way is to look at the equilibrium statistics of the most stable states between WT- and ∆107-EmrE and compare the differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is well done and well presented. In my opinion, the authors must address the following questions.

      (1) It is unclear to a non-SSME-expert, why the net charge translocated in delta_107 is larger than in WT. For such small pH gradients (0.5-1pH unit), it seems that only a few protons would leave the liposome before the internal pH is adjusted to be the same as the external. This number can be estimated given the size of the liposomes. What is it? Once the pH gradient is dissipated, no more net proton transport should be observed. So, why would more protons flow out of the mutant relative to WT?

      We appreciate the complexity of both the system and assay and have made revisions to both the main text and SI to address these points more clearly. While we can estimate liposomes size, we cannot easily quantify the number of liposomes on the sensor surface so cannot calculate the amount of charge movement as suggested by the reviewer. We have revised Fig. 3.2 and added additional data at low and high pH with different lipid to protein ratios to distinguish pre-steady state (proton release from the protein) and steady state processes (transport). An extended Fig. 3.2 caption and revised discussion in the main text clarify these points.

      We have also revised SI figure 3.2 to include an example of transport driven by an infinite drug gradient. Drug-proton antiport results in net charge build-up in the liposome since two protons will be driven out for every +1 drug transported in. This also creates a pH gradient is created (higher proton concentration outside). The negative inside potential inhibits further antiport of drug. However, both the negative-inside potential and proton gradient will drives protons back into the liposome if there is a leak pathway available. This is clearly visible with a reversal of current negative (antiport) to positive (proton backflow), and the magnitude of this back flow is larger for ∆107-EmrE which lacks the regulatory elements provided by the C-terminal tail. We have amended the main text and SI to include this discussion.

      (2) Given the estimated rate of transport, size of liposomes, and pH gradient, how quickly would the SSME liposomes reach pH balance?

      Since SSME measurements are due to capacitive coupling and will represent the net charge movement, including pre-steady state contributions, the current values will be incredibly sensitive to individual rates of alternating access, proton and drug on- and off-rates. Time to pH balance would, therefore, differ based on the construct, LPR, absolute pH or drug concentrations as well as the magnitude of the given gradients. For this reason, we necessarily use integrated currents (transported charge over time) when comparing mutants as it reflects kinetic differences inherent to the mutant without over-processing the data, for example, by normalizing to peak currents which would over emphasize certain properties that will differ across mutants. This process allows for qualitative comparisons by subjecting mutants to the same pH and substrate gradients when the same density of transporter construct is present, and care is given to not overstate the importance of the actual quantities of charges that are moving as they will be highly context dependent. This is clearly seen in Fig 3.2 where the current is not zero and the net transported charge is still changing at the end of 1 second. We have amended SI figure 3.2 and the main text to include this discussion.

      (3) Given that H110 and E14 would deprotonate when the external pH is elevated above 7 and that these protons would be released to external bulk, the external bulk pH would decrease twice as much for WT compared to delta107. This would decrease the pH gradient for WT relative to the mutant. Can these effects be quantified and accounted for? Would this ostensibly decrease the amount of charge that transfers into the liposomes for WT? How would this impact the current interpretation that the two systems are driven by the same gradient?

      The reviewer is correct that there will be differences in deprotonation of WT and ∆107 and the amount of proton release will also change with pH. We have amended Figure 3.2 to clarify this difference and its significance. For the proton gradient only conditions in Figure 3, each set of liposomes were equilibrated to the starting pH by repeated washings and incubation before measurement occurred. For example, for the pH 6.5 inside, pH 7 outside condition, both the inside and outside pH were equilibrated at 6.5, and both E14 residues will be predominantly protonated in WT and ∆107, and H110 will be predominantly protonated in WT-EmrE. Upon application of the external pH 7 solution, protons will be released from the E14 of either construct, with additional proton being released from H110 for WT-EmrE causing a large pre-steady state negative contribution to the signal (Fig. 3.2A). Under this pH condition, we the peak current correlates with the LPR, as this release of protons will depend on density of the transporter. However, we also see that the longer-time decay of the signal correlates with the construct (WT or ∆107) and is relatively independent of LPR, consistent with a transport process rather than a rapid pre-steady state release of protons. Therefore, when we look at the actual transported charge over time, despite the higher contribution of proton release to the WT-EmrE signal, the significant increase in uncoupled proton transport for the C-terminal deletion mutant dominates the signal.

      As a contrast, we apply this same analysis to the pH 8 inside, pH 8.5 outside condition where both sets of transports will be deprotonated from the start (Fig. 3.2B). Now the peak currents, decay rates, and transported charge over time are all consistent for a given construct (WT or ∆107). The two LPRs for an individual construct match within error, as the differences in overall charge movement and transported charge over time are independent of pre-steady-state proton release from the transporter at high pH.

      (4) A related question, how does the protonation of H110 influence the potential rate of proton transport between the two systems? Does the proton on H110 transfer to E14?

      The protonation of H110 will only influence the rate of transport of WT-EmrE as its protonation is required for formation of the hydrogen bonding network that coordinates gating. However, protonation of both E14s will influence the rate of proton transport of both systems as protonation state affects the rate of alternating access which is necessary for proton turnover. This is another reason we use the transported charge over time metric to compare mutants as it allows for a common metric for mutants with altered rates which are present in the same density and under the same gradient conditions. We do not have any evidence to support transfer of proton from H110 to E14, but there is also no evidence to exclude this possibility. We do not discuss this in the manuscript because it would be entirely speculative.

      (5) Is the pKa in the simulations (Figure 6B) consistent with the experiment?

      We calculated the pKa from this WT PMF and got a pKa of 7.1, which is in close proximity of the experimental value of 6.8

      (6) Why isn't the PMF for delta_107 compared to WT to corroborate the prediction that hydration sufficiently alters both the rate and pKa of E14?

      We appreciate the reviewer’s suggestion and agree that a direct comparison would be valuable. However, several factors limit the interpretability of such an analysis in this context:

      (a) Our data indicate that the primary difference in free energy barriers between WT and Δ107 lies in the hydration step rather than proton transport itself. To fully resolve this, a 2D PMF calculation via 2D umbrella sampling would be required which can be very expensive. Solely looking at the proton transport side of this PMF will not give much difference.

      (b) Given this, the aim for us to calculate this PMF is to support our conjecture that the bottleneck for such transport is the hydrophobic gate.

      (7) The authors suggest that A61 rotation 'controls the water wire formation' by measuring the distribution of water connectivity (water-water distances via logS) and average distances between A61 and I68/I67. Delta_107 has a larger inter-residue distance (Figure 6A) more probable small log S closer waters connecting E14 and two residues near the top of the protein (Figure 5A). However, it strikes me that looking at average distances and the distribution of log S is not the best way to do this. Why not quantify the correlation between log S and A61 orientation and/or A61-I68/I71 distances as well as their correlation to the proposed tail interactions (D84-R106 interactions) to directly verify the correlation (and suggest causation) of these interactions on the hydration in this region. Additionally, plotting the RMSD or probability of waters below I68 and I171 as a function of A61-I68 distances and/or numbers over time would support the log S analysis.

      The reviewer requested that we provide direct correlation analyses between A61 orientation, residue distances (A61-I68/I71), and water connectivity (logS) to better support the claim about water wire formation, rather than relying solely on average distances and distributions.

      We appreciate the reviewer’s suggestion to strengthen our analysis with direct correlations. However, due to the slow kinetics of hydration/dehydration events, unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses. Instead, our approach focuses on equilibrium statistics, which reveal the dominant conformational states of WT- and Δ107-EmrE and provide meaningful insights into shifts in hydration patterns.

      (8) It looks like the D84-R106 salt bridge controls this A61-I68 opening. Could this also be quantifiably correlated?

      As discussed in response to the previous question, the unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses.

      (9) The NMR results show that alternating access increases in frequency from ~4/s for WT at low and high pH to ~17/s for delta_107 only at high pH. They then go on to analyze potential titration changes in the delta_107 mutant, finding two residues with approximate pKa values of 5.6 and 7.1. The former is assigned to E14, consistent with WT. But the latter is suggested to be either D84, which salt bridges to R106, or the C-terminal carboxylate. If it is D84, why would deprotonation, which would be essential to form the salt bridge, increase the rate of alternating access relative to WT?

      We note that the faster alternating access rate was observed for TPP+-bound ∆107-EmrE, not the transporter in the absence of substrate. In the absence of substrate the relatively broad lines preclude quantitative determination of the alternating access rate by NMR making it difficult to judge the validity of the reviewers reasoning. Identification of which residue (D84 or H110) corresponds to the shifted pKa is ultimately of little consequence as this mutant does not reflect the native conditions of the transporter. It is far more important to acknowledge that both R106 and D84 are sensitive to this deprotonation as it indicates these residues are close in space and provides experimental support for the existence of the salt bridge identified in the MD simulations, as discussed in the manuscript.

      (10) In a more general sense, can the authors speculate why an efflux pump would evolve this type of secondary gate that can be thrown off by tight binding in the allosteric site such as that demonstrated by Harmane? What potential advantage is there to having a tail-regulated gate?

      This was likely a necessity to allow for better coupling as these transporters evolved to be more promiscuous. The C-terminal tail is absent in tightly coupled family members such as Gdx who are specific for a single substrate and have a better-defined transport stoichiometry. We have included this discussion in the main text and are currently investigating this phenomenon further. Those experiments are beyond the scope of the current manuscript.

      (11) It is hard to visualize the PT reaction coordinate. Is the e_PT unit vector defined for each window separately based on the initial steered MD pathway? If so, how reliant is the PT pathway on this initial approximate path? Also, how does this position for each window change if/when E14 rotates? This could be checked by plotting the x,y,z distributions for each window and quantifying the overlap between windows in cartesian space. These clouds of distributions could also be plotted in the protein following alignment so the reader can visualize the reaction coordinate. Does the CEC localization ever stray to different, disconnected regions of cartesian phase space that are hidden by the reaction coordinate definition?

      The unit vector e_PT is the same across all windows based on unbiased MD. Therefore, the reaction coordinate (a scalar) is the vector from the starting point to the CEC, projected on this unit vector. E14 rotation does not significantly change the window definition a lot unless the CEC is very close to E14, where we found this to be a better CV. For detailed discussions about this CV, especially a comparison between a curvilinear CV, please see J. Am. Chem. Soc. 2018, 140, 48, 16535–16543 “Simulations of the Proton Transport” and its SI Figure S1.In the Supplementary Information, we added figure 6.1 to show the average X, Y, Z coordinates of each umbrella window.

      (12) Lastly, perhaps I missed it, but it's unclear if the rate of substrate efflux is also increased in the delta_107 mutant. If this is also increased, then the overall rate of exchange is faster, including proton leak. This would be important to distinguish since the focus now is entirely on proton leaks. I.e., is it only leak or is it overall efflux and leak?

      We have amended SI figure 3.2 to include a gradient condition where an infinite drug gradient is created across the liposome. The infinite gradient allows for rapid transport of drug into the liposomes until charge build-up opposes further transport. This peak is at the same time for both LPRs of WT- and ∆107-EmrE suggesting the rate of substrate transport is similar. Differences in the peak heights across LPRs can be attributed to competition between drug and proton for the primary binding site such that more proton will be released for the higher density constructs as described above. This process does also create a proton gradient as drug moving in is coupled to two protons moving out so as charge build-up inhibits further drug movement, the building proton gradient will also begin to drive proton back in which is another example of uncoupled leak. Here, again we see that this back-flow of protons or leak is of greater magnitude for ∆107-EmrE proteoliposomes that for those with WT-EmrE. We have included this discussion in the SI and main text.

      Minor

      (1) Introduction - the authors describe EmrE as a model system for studying the molecular mechanism of proton-coupled transport. This is a rather broad categorization that could include a wide range of phenomena distal from drug transport across membranes or through efflux pumps. I suggest further specifying to not overgeneralize.

      We revised to note the context of multidrug efflux.

      Reviewer #2 (Recommendations for the authors):

      Simulations. The initial water wire analysis is based on 4 different 1 ms simulations presented in Figure 5. The 3 WT replicates show similar results for the tail-blocking water wire formation, but the details of the system build and loop/C-terminal tail placement are not clear. It does appear that a single C-terminal tail model was created for all WT replicates. Was there also modeling for any parts of the truncation mutant? Regardless, since these initial placements and uncertainties in the structures may impact the results and subsequent water wire formation, I would like a discussion of how these starting structures impacted the formation or not of wires. I think that another WT replicate should be run starting from a completely new build that places the tail in a different (but hopefully reasonable location). This could be built with any number of tools to generate reasonable starting structures. It's critical to ensure that multiple independent simulations across different initial builds show the same water wire behavior so that we know the results are robust and insensitive to the starting structure and stochastic variation.

      We thank Reviewer 2 for their suggestion regarding the discussion of the initial structure. In our simulations, the C-terminal tail was initially modeled in an extended conformation (solvent-exposed) to mimic its disordered state prior to folding. This approach resembles an annealing process, where the system evolves from a higher free-energy state toward equilibrium. Notably, across all three replicas, we observed consistent folding of the tail onto the protein surface, supporting the robustness of this conformational preference.

      For the Δ107 truncation mutant, minimal modeling was required, as most experimental structures resolve residues up to S105 or R106. To rigorously assess the influence of the starting configuration, we analyzed the tail’s dynamics using backbone dihedral angle auto- and cross-correlation functions (new Supplementary Figures 10.1 and 10.2). These analyses reveal rapid decay of correlations—consistent with the tail’s short length (5 residues) and high flexibility—indicating that the system "forgets" its initial configuration well within the simulation timescale. Thus, we conclude that our sampling is sufficient to capture equilibrium behavior, independent of the starting structure.

      What does the size of the barrier in the PMF (Figure 6B) imply about the rate of proton transfer/leak and can the pKa shift of the acidic residue be estimated with this energy value compared to bulk?

      We noticed this point aligns with a related concern raised by Reviewer 1. For a detailed discussion please refer to Point 5 in our response to Reviewer 1.

      Experimental validation. The hypotheses generated by this work would be better buttressed if there were some mutation work at the hydrophobic gate (61, 68, 71) to support it. I realize that this may be hard, but it would significantly improve the quality.

      Due to the small size of the transporter, any mutagenesis of EmrE should necessarily be accompanied by functional characterization to fully assess the effects of the mutation on rate-limiting steps. We have revised the manuscript to add a discussion of the challenges with analyzing simple point mutants and citing what is known from prior scanning mutagenesis studies of EmrE.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors present a novel CRISPR/Cas9-based genetic tool for the dopamine receptor dop1R2. Based on the known function of the receptor in learning and memory, they tested the efficacy of the genetic tool by knocking out the receptor specifically in mushroom body neurons. The data suggest that dop1R2 is necessary for longer-lasting memories through its action on ⍺/ß and ⍺'/ß' neurons but is dispensable for short-term memory and thus in ɣ neurons. The experiments impressively demonstrate the value of such a genetic tool and illustrate the specific function of the receptor in subpopulations of KCs for longer-term memories. The data presented in this manuscript are significant.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the role of the dopamine receptor, Dop1R2, in memory formation. This receptor has complex roles in supporting different stages of memory, and the neural mechanisms for these functions are poorly understood. The authors are able to localize Dop1R2 function to the vertical lobes of the mushroom body, revealing a role in later (presumably middle-term) aversive and appetitive memory. In general, the experimental design is rigorous, and statistics are appropriately applied. While the manuscript provides a useful tool, it would be strengthened further by additional mechanistic studies that build on the rich literature examining the roles of dopamine signaling in memory formation. The claim that Dop1R2 is involved in memory formation is strongly supported by the data presented, and this manuscript adds to a growing literature revealing that dopamine is a critical regulator of olfactory memory. However, the manuscript does not necessarily extend much beyond our understanding of Dop1R2 in memory formation, and future work will be needed to fully characterize this reagent and define the role of Dop1R2 in memory.

      Strengths:

      (1) The FRT lines generated provide a novel tool for temporal and spatially precise manipulation of Dop1R2 function. This tool will be valuable to study the role of Dop1R2 in memory and other behaviors potentially regulated by this gene.

      (2) Given the highly conserved role of Dop1R2 in memory and other processes, these findings have a high potential to translate to vertebrate species.

      Weaknesses:

      (1) The authors state Dop1R2 associates with two different G-proteins. It would be useful to know which one is mediating the loss of aversive and appetitive memory in Dop1R2 knockout flies.

      We thank you for the insightful comment. We agree that it would be very useful to know which G-proteins are transmitting Dop1R2 signaling. To that extent, we examined single-cell transcriptomics data to check the level of co-expression of Dop1R2 with G-proteins that are of interest to us. (Figure 1 S1)

      Lines 312-325

      “Some RNA binding proteins and Immediate early genes help maintain identities of Mushroom body cells and are regulators of local transcription and translation (de Queiroz et al., 2025; Raun et al., 2025). So, the availability of different G-proteins may change in different lobes and during different phases of memory. The G-protein via which GPCRs signal, may depend on the pool of available G-proteins in the cell/sub-cellular region (Hermans, 2003)., Therefore, Dop1R2 may signal via different G-proteins in different compartments of the Mushroom body and also different compartments of the neuron. We looked at Gαo and Gαq as they are known to have roles in learning and forgetting (Ferris et al., 2006; Himmelreich et al., 2017). We found that Dop1R2 co-expresses more frequently with Gαo than with Gαq (Figure 1 S1). While there is evidence for Dop1R2 to act via Gαq (Himmelreich et al., 2017). It is difficult to determine whether this interaction is exclusive, or if Dop1R2 can also be coupled to other G-proteins. It will be interesting to determine the breadth of G-proteins that are involved in Dop1R2 signaling.”

      (2) It would be interesting to examine 24hr aversive memory, in addition to 24hr appetitive memory.

      This is indeed an important point and we agree that it will complete the assessment of temporally distinct memory traces. We therefore performed the Aversive LTM experiments and include them in the results.

      Lines 208-228

      “24h memory is impaired by loss of Dop1R2

      Next, we wanted to see if later memory forms are also affected. One cycle of reward training is sufficient to create LTM (Krashes & Waddell, 2008), while for aversive memory, 5-6 cycles of electroshock-trainings are required to obtain robust long-term memory scores (Tully et al., 1994). So, we looked at both, 24h aversive and appetitive memory. For aversive LTM, the flies were tested on the Y-Maze apparatus as described in (Mohandasan et al., (2022).

      Flipping out Dop1R2 in the whole MB causes a reduced 24h memory performance (Figure 4A, E). No phenotype was observed when Ddop1R2 was flipped out in the γ-lobe (Figure 4B, F). However, similar to 2h memory, loss of Ddop1R2 in the α/β-lobes (Figure 4C, G) or the α’/β’-lobes (Figure 4D, H) causes a reduction in memory performance. Thus, Dop1R2 seems to be involved in aversive and appetitive LTM in the α/β-lobes and the α’/β’-lobes.

      Previous studies have shown mutation in the Dop1R2 receptor leads to improvement in LTM when a single shock training paradigm is used (Berry et al., 2012). As we found that it disrupts LTM, we wanted to verify if the absence of Dop1R2 outside the MB is what leads to an improvement in memory. To that extent, we tested panneuronal flip-out of Dop1R2 flies for 6hr and 24hr memory upon single shock using the elav-Gal4 driver. We found that it did not improve memory at both time points (Figure 4 S1). Confirming that flipping out Dop1R2 panneuronally does not improve LTM (Figure 4 S1C) and highlighting its irrelevance in memory outside the MB.”

      (3) The manuscript would be strengthened by added functional analysis. What are the DANs that signal through Dop1R. How do these knockouts impact MBONs?

      We thank you for this question. We indeed agree that it is a highly relevand and open question, how distinct DANs signal via distinct Dopamine receptors. Our work here uniquely focusses on Dop1R2 within the MB. We aim to investigate other DopRs and the connection between DANs in the future using similar approaches.

      (4) Also in Figure 2, the lobe-specific knockouts might be moved to supplemental since there is no effect. Instead, consider moving the control sensory tests into the main figure.

      We thank you for this suggestion and understand that in Figure 2 no significant difference is seen. However, we have emphasized in the text that the results from the supplementary figures are just to confirm that the modifications made at the Dop1R2 locus did not alter its normal function.

      Lines 156-162

      “We wanted to see if flipping out Dop1R2 in the MB affects memory acquisition and STM by using classical olfactory conditioning. In short, a group of flies is presented with an odor coupled to an electric shock (aversive) or sugar (appetitive) followed by a second odor without stimulus. For assessing their memory, flies can freely choose between the odors either directly after training (STM) or at a later timepoint.

      To ensure that the introduced genetic changes to the Dop1R2 locus do not interfere with behavior we first checked the sensory responses of that line”

      (5) Can the single-cell atlas data be used to narrow down the cell types in the vertical lobes that express Dop1R2? Is it all or just a subset?

      This is indeed an interesting question, and we thank you for mentioning it. To address this as best as we could, we analyzed the single cell transcriptomic data from (Davie et al., 2018) and presented it in Figure 1 S1.

      Reviewer #3 (Public Review):

      Summary:

      Kaldun et al. investigated the role of Dopamine Receptor Dop1R2 in different types and stages of olfactory associative memory in Drosophila melanogaster. Dop1R2 is a type 1 Dopamine receptor that can act both through Gs-cAMP and Gq-ERCa2+ pathways. The authors first developed a very useful tool, where tissue-specific knock-out mutants can be generated, using Crispr/Cas9 technology in combination with the powerful Gal4/UAS gene-expression toolkit, very common in fruit flies.

      They direct the K.O. mutation to intrinsic neurons of the main associative memory centre fly brain-the mushroom body (MB). There are three main types of MB-neurons, or Kenyon cells, according to their axonal projections: a/b; a'/b', and g neurons.

      Kaldun et al. found that flies lacking dop1R2 all over the MB displayed impaired appetitive middle-term (2h) and long-term (24h) memory, whereas appetitive short-term memory remained intact. Knocking-out dop1R2 in the three MB neuron subtypes also impaired middle-term, but not short-term, aversive memory.

      These memory defects were recapitulated when the loss of the dop1R2 gene was restricted to either a/b or a'/b', but not when the loss of the gene was restricted to g neurons, showcasing a compartmentalized role of Dop1R2 in specific neuronal subtypes of the main memory centre of the fly brain for the expression of middle and long-term memories.

      Strengths:

      (1) The conclusions of this paper are very well supported by the data, and the authors systematically addressed the requirement of a very interesting type of dopamine receptor in both appetitive and aversive memories. These findings are important for the fields of learning and memory and dopaminergic neuromodulation among others. The evidence in the literature so far was generated in different labs, each using different tools (mutants, RNAi knockdowns driven in different developmental stages...), different time points (short, middle, and long-term memory), different types of memories (Anesthesia resistant, which is a type of protein synthesis independent consolidated memory; anesthesia sensitive, which is a type of protein synthesis-dependent consolidated memory; aversive memory; appetitive memory...) and different behavioral paradigms. A study like this one allows for direct comparison of the results, and generalized observations.

      (2) Additionally, Kaldun and collaborators addressed the requirement of different types of Kenyon cells, that have been classically involved in different memory stages: g KCs for memory acquisition and a/b or a'/b' for later memory phases. This systematical approach has not been performed before.

      (3) Importantly, the authors of this paper produced a tool to generate tissue-specific knock-out mutants of dop1R2. Although this is not the first time that the requirement of this gene in different memory phases has been studied, the tools used here represent the most sophisticated genetic approach to induce a loss of function phenotypes exclusively in MB neurons.

      Weaknesses:

      (1) Although the paper does have important strengths, the main weakness of this work is that the advancement in the field could be considered incremental: the main findings of the manuscript had been reported before by several groups, using tissue-specific conditional knockdowns through interference RNAi. The requirement of Dop1R2 in MB for middle-term and long-term memories has been shown both for appetitive (Musso et al 2015, Sun et al 2020) and aversive associations (Plaçais et al 2017).

      Thank you for this comment. We believe that the main takeaway from the paper is the elegant tool we developed, to study the role of Dop1R2 in fruit flies by effectively flipping it out spatio-temporally. Additionally, we studied its role in all types of olfactory associative memory to establish it as a robust tool that can be used for further research in place of RNAi knockouts which are shown to be less efficient in insects as mentioned in the texts in line 394-398.

      “The genetic tool we generated here to study the role of the Dop1R2 dopamine receptor in cells of interest, is not only a good substitute for RNAi knockouts, which are known to be less efficient in insects (Joga et al., 2016), but also provides versatile possibilities as it can be used in combination with the powerful genetic tools of Drosophila.”

      (2) The approach used here to genetically modify memory neurons is not temporally restricted. Considering the role of dopamine in the correct development of the nervous system, one must consider the possible effects that this manipulation can have in the establishment of memory circuits. However, previous studies addressing this question restricted the manipulation of Dop1R2 expression to adulthood, leading to the same findings than the ones reported in this paper for both aversive and appetitive memories, which solidifies the findings of this paper.

      We thank you for this comment and we agree that it would be important to show a temporally restricted effect of Dop1R2 knockout. To assess this and rule out potential developmental defects we decided to restrict the knockout to the post-eclosion stage and to include these results.

      Lines 230-250

      “Developmental defects are ruled out in a temporally restricted Dop1R2 conditional knockout.

      To exclude developmental defects in the MB caused by flip-out of Dop1R2, we stained fly brains with a FasII antibody. Compared to genetic controls, flies lacking Dop1R2 in the mushroom body had unaltered lobes (Figure 4 S2C).

      Regardless, we wanted to control for developmental defects leading to memory loss in flip-out flies. So, we generated a Gal80ts-containing line, enabling the temporal control of Dop1R2 knockout in the entire mushroom body (MB). Given that the half-life of the receptor remains unknown, we assessed both aversive short-term memory (STM) and long-term memory (LTM) to determine whether post-eclosion ablation of Dop1R2 in the MB produced differences compared to our previously tested line, in which Dop1R2 was constitutively knocked out from fertilization. To achieve this, flies were maintained at 18°C until eclosion and subsequently shifted to 30°C for five to seven days. On the fifth day, training was conducted, followed by memory testing. Our results indicate that aversive STM was not significantly impaired in Dop1R2-deficient MBs compared to control flies (Figure 4 S3), consistent with our previous findings (Figure 2). However, aversive LTM was significantly impaired relative to control lines (Figure 4 S3), which also aligned with prior observations. These findings strongly indicate that memory loss caused by Dop1R2 flip-out is not due to developmental defects.”

      (3) The authors state that they aim to resolve disparities of findings in the field regarding the specific role of Dop1R2 in memory, offering a potent tool to generate mutants and addressing systematically their effects on different types of memory. Their results support the role of this receptor in the expression of long-term memories, however in the experiments performed here do not address temporal resolution of the genetic manipulations that could bring light into the mechanisms of action of Dop1R2 in memory. Several hypotheses have been proposed, from stabilization of memory, effects on forgetting, or integration of sequences of events (sensory experiences and dopamine release).

      We thank you for this comment. We agree that it would be interesting to dissect the memory stages by knocking out the receptor selectively in some of them (encoding, consolidation, retrieval). However, our tool irreversibly flips out Dop1R2 preventing us from investigating the receptor’s role in retrieval. Our results show that the receptor is dispensable for STM formation (Figure 2, Figure 4 Supplement 3), suggesting that it is not involved in encoding new information. On the other hand, it is instead involved in consolidation and/or retrieval of long-term and middle-term memories (Figure 3, Figure 4, Figure 5B).

      Overall, the authors generated a very useful tool to study dopamine neuromodulation in any given circuit when used in combination with the powerful genetic toolkit available in Drosophila. The reports in this paper confirmed a previously described role of Dop1R2 in the expression of aversive and appetitive LTM and mapped these effects to two specific types of memory neurons in the fly brain, previously implicated in the expression and consolidation of long-term associative memories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On the first view, the results shown here are different from studies published earlier, while in the same line with others (e.g. Sun et al, for appetitive 24h memories). For example, Berry et al showed that the loss of dop1R2 impairs immediate memory, while memory scores are enhanced 3h, 6h, and 24h after training. Further, they showed data that shock avoidance, at least for higher shock intensities, is reduced in mutant (damb) flies. All in all, this favors how important it is to improve the genetic tools for tissue-specific manipulation. Despite the authors nicely discussing their data with respect to the previous studies, I wondered whether it would be suitable to use the new tool and knock out dop1R2 panneuronally to see whether the obtained data match the results published by Berry et al.. Further, as stated in line 105ff: "As these studies used different learning assays - aversive and appetitive respectively as well as different methods, it is unclear if Dop1R2 has different functions for the different reinforcement stimulus" I wondered why the authors tested aversive and appetitive learning for STM and 2h memory, but only appetitive memory for 24h.

      Thank you for this comment. To that extent, as mentioned above in response to reviewer #2, we included in the results the aversive LTM experiment (Figure 4). Moreover, we performed experiments along the line of Berry et al. using our tool as shown in Figure 4 S1. Our results support that Dop1R2 is required for LTM, rather than to promote forgetting.

      (2) Line 165ff: I can´t find any of the supplementary data mentioned here. Please add the corresponding figures.

      Thank you for pointing this out. In that line we don’t refer to any supplementary data, but to the Figure 1F, showing the absence of the HA-tag in our MB knock-out line. We have clarified this in the text (lines 151-153)

      (3) I can't imagine that the scale bar in Figure 1D-F is correct. I would also like to suggest to show a more detailed analysis of the expression pattern. For example, both anterior and posterior views would be appropriate, perhaps including the VNC. This would allow the expression pattern obtained with this novel tool to be better compared with previously published results. Also, in relation to my comment above (1), it may help to understand the functional differences with previous studies, especially as the authors themselves state that the receptor is "mainly" expressed in the mushroom body (line 99). It would be interesting to see where else it is expressed (if so). This would also be interesting for the panneuronal knockdown experiment suggested under (1). If the receptor is indeed expressed outside the mushroom body, this may explain the differences to Berry et al.

      Thank you for noting this, there was indeed a mistake in the scale bar which we now fixed. Since with our HA-tag immunostaining we could not detect any noticeable signal outside of the MB, we decided to analyze previously existing single cell transcriptomics data that showed expression of the receptor in 7.99% of cells in the VNC and in 13.8% of cells outside the MB (lines 98-100) confirming its sparse expression in the nervous system. The lack of detection of these cells is likely due to the sparse and low expression of the protein. The HA-tag allows to detect the endogenous level of the locus (it is possible that a Gal4/UAS amplification of the signal might allow to detect these cells).

      Regarding the panneuronal knockout, we decided to try to replicate the experiment shown in Berry et al. in Figure 4 S1 and found that Dop1R2 is required for LTM.

      (4) Related to learning data shown in Figures 2-4, the authors should show statistical differences between all groups obtained in the ANOVA + PostHoc tests. Currently, only an asterisk is placed above the experimental group, which does not adequately reflect the statistical differences between the groups. In addition, I would like to suggest adding statistical tests to the chance level as it may be interesting to know whether, for example, scores of knockout flies in 3C and 3D are different from the chance level.

      Many thanks for this correction, we agree with the fact that the way significance scores were shown was not informative enough. We fixed the point by now showing significance between all the control groups and the experimental ones. We also inserted the chance level results in the figure legends.

      (5) Unfortunately, the manuscript has some typing errors, so I would like to ask the authors to check the manuscript again carefully.

      Some Examples:

      Line 31: the the

      Line 56: G-Protein

      Line 64: c-AMP

      Line 68: Dopamine

      Line 70: G-Protein (It alternates between G-protein and G-Protein)

      Line 76: References are formatted incorrectly

      Line 126: Ha-Tag (It alternates between Ha and HA)

      Line 248: missing space before the bracket...is often found

      Thank you for noticing these errors, we have now corrected the spelling throughout the manuscript.

      (6) In the figures the axes are labelled Preference Index (Pref"I"). In the methods, however, the calculation formula is defined as "PREF".

      We thank you for drawing attention to this. To avoid confusion, we changed the definition in the methods section so that it could be clear and coherent (“Memory tests” paragraph in the methods section).

      “PREF = ((N<sub>arm1</sub> - N<sub>arm2</sub>) 100) / N<sub>total</sub> the two preference indices were calculated from the two reciprocal experiments. The average of these two PREFs gives a learning index (LI). LI = (PREF<sub>1</sub> + PREF<sub>2</sub>) / 2.

      In case of all Long-term Aversive memory experiments, Y-Maze protocol was adapted to test flies 24 hours post training. Testing using the Y-Maze was done following the protocol as described in (Mohandasan et al., 2022) where flies were loaded at the bottom of 20-minutes odorized 3D-printed Y-Mazes from where they would climb up to a choice point and choose between the two odors. The learning index was then calculated after counting the flies in each odorized vial as follows: LI = ((N<sub>CS-</sub> - N<sub>CS+</sub>) 100) / N<sub>total</sub>. Where NCS- and NCS+ are the number of flies that were found trapped in the untrained and trained odor tube respectively.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figures 2 and 3, the legends running two different subfigures is confusing. Would be helpful to find a different way to present.

      Thank you for your suggestion. We modified how we present legends, placing them vertically so that it is clearer.

      (2) Use additional drivers to verify middle and long-term memory phenotypes.

      We agree that it would be interesting to see the role of Dop1R2 in other neurons. To that extent, we looked at long term aversive memory in flies where the receptor was panneuronaly flipped out, and did not find evidence that suggested involvement of Dop1R2 in memory processes outside the MB. (Figure 4 S1)

      (3) Additional discussion of genetic background for fly lines would be helpful.

      Thank you for your advice. We have mentioned the genetic background of flies in the key resources table of the methods sections. Additionally, we also included further explanation on how the lines were created and their genetic background (see “Fly Husbandry” paragraph in the methods section).

      “UAS-flp;;Dop1R2 cko flies and Gal4;Dop1R2<sup>cko</sup> flies were crossed back with ;;Dop<sup>cko</sup> flies to obtain appropriate genetic controls which were heterozygous for UAS and Gal4 but not Dop1R2<sup>cko</sup>.”

      Reviewer #3 (Recommendations For The Authors):

      Line 109 states that to resolve the problem a tool is developed to knock down Dop1R2 in s spatial and temporal specific manner- while I agree that this is within the potential of the tool, there is no temporal control of the flipase action in this study; at least I cannot find references to the use of target/gene switch to control stages of development or different memory phases. However the version available for download is missing supplementary information, so I did not have access to supplementary figures and tables.

      Thank you for the comment, as mentioned before it would be great to be able to dissect the memory phases. We show in lines 232 – 250 and Figure 4 S3 that the temporally restricted flip-out to the post-eclosion life stage gave us coherent results with the previous findings, ruling out potential developmental defects.

      In relation to my comment on the possible developmental effects of the loss of the gene, Figure 1F could showcase an underdeveloped g lobe when looking at the lobe profiles. I understand this is not within the scope of the figure, but maybe a different z projection can be provided to confirm there are no obvious anatomical alterations due to the loss of the receptor.

      We understand the doubt about the correct development of the MB and we thank you for your insightful comment. To that extent we decided to perform a FasII immunostaining that could show us the MB in the different lines (Figure 4 S2) and it appears that there are no notable differences in the lobes development in our knockout line.

      It seems that the obvious missing piece of the puzzle would be to address the effects of knocking out Dop1R2 in aversive LTM. The idea of systematically addressing different types of memory at different time points and in different KCs is the most attractive aspect of this study beyond the technical sophistication, and it feels that the aim of the study is not delivered without that component.

      We agree and we thank you for the clarification. As mentioned above in response to Reviewer #2, we decided to test aversive LTM as described in lines –208-228, Figure 4, Figure 4 S1.

      Some statements of the discussion seem too vague, and I think could benefit from editing:

      Line 284 "however other receptors could use Gq and mediate forgetting"- does this refer to other dopamine receptors? Other neuromodulators? Examples?

      Thank you for pointing this out. We Agree and therefore decided to omit this line.

      Line 289 "using a space training protocol and a Dop1R2 line" - this refers to RNAi lines, but it should be stated clearly.

      That is correct, we thank you for bringing attention to this and clarified it in the manuscript.

      –Lines 329-330

      “Interestingly, using a spaced training protocol and a Dop1R2 RNAi knockout line another study showed impaired LTM (Placais et al., 2017).”

      The paragraph starting in line 305 could be re-written to improve clarity and flow. Some statements seem disconnected and require specific citations. For example "In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways and the internal state of the animal...". This is not accurate. Berry et al 2012 report enhanced LTM performance in dop1R2 mutants whereas Plaçais et al 2017 report LTM defects in Dop1R2 knock-downs, but these different findings do not seem to rely on different internal states or signaling pathways. Maybe further elaboration can help the reader understand this speculation.

      We agree and we thank you for this advice. We decided to add additional details and citations to validate our speculation

      Lines 350-353

      “In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways. The signaling pathway that is activated further depends on the available pool of secondary messengers in the cell (Hermans, 2003) which may be regulated by the internal state of the animal.”

      "...for reward memory formation, loss of Dop1R2 seems to impair memory", this seems redundant at this point, as it has been discussed in detail, however, citations should be provided in any case (Musso 2015, Sun 2020)

      Thank you for noting this. We recognize the redundancy and decided to exclude the line.

      Finally, it would be useful to additionally refer to the anatomical terminology when introducing neuron names; for example MBON MVP2 (MBON-g1pedc>a/b), etc.

      Thank you for this suggestion. We understand the importance of anatomical terminologies for the neurons. Therefore, we included them when we introduce neurons in the paper.

      We thank you for your observations. We recognize their value, so we have made appropriate changes in the discussion to sound less vague and more comprehensive.

    1. photography

      living over 80 years in the future photography is so normal we don't even consider it, but for a forward looking person he could see the implications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca2+-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca2+-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca2+-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca2+ activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)" The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel." For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter" For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Indeed it turns out the number of available job opportunities for translators and interpreters has actually been increasing. This is not to say that the technology isn’t good, I think it’s pretty close to as good as it can be at what it does. It’s also not to say that machine translation hasn’t changed the profession of translation: in the article linked above, Bridget Hylak, a representative from the American Translators Association, is quoted as saying “Since the advent of neural machine translation (NMT) around 2016, which marked a significant improvement over traditional machine translation like Google Translate, we [translators and interpreters] have been integrating AI into our workflows.” To explain this apparent contradiction, we need to understand what it is translators actually do because, like us programmers, they suffer from having the nature of their work consistently misunderstood by non-translators. The laity’s image of a translator is a walking dictionary and grammar reference, who substitutes words and and grammatical structures from one language to another with ease, the reality is that a translators’ and interpreters’ work is mostly about ensuring context, navigating ambiguity, and handling cultural sensitivity. This is what Google Translate cannot currently do.

      Shitty text being available in more languages may make people want more good text in their languages, too.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Strengths:

      The subject is of importance.

      Weaknesses:

      The conclusions are too strong for the presented data. The lack of statistical analysis makes this paper incomplete. The novelty of the findings is not clear.

      We have strengthened the data analysis by including appropriate statistical tests to support our conclusions more convincingly. Additionally, we have refined the description of the research background to better emphasize the novelty and significance of our findings. Please see the detailed responses below for further information.

      Major issues:

      (1) The novelty is in question since in the Abstract the authors highlight their main finding, which is that both the chemotaxis complex and the flagella localize to the same pole, as surprising. However, in the Introduction they state that "pathway-related receptors that mediate chemotaxis, as well as the flagellum are localized at the same cell pole17,18". I am not a pseudomonas researcher and from my short glance at these references, I could not tell whether they report colocalization of the two structures to the same pole. However, I trust the authors that they know the literature on the localization of the chemotaxis complex and flagella in their organism. See also major issue number 5 on the novelty regarding the involvement of c-di-GMP.

      We thank the reviewer for this valuable comment and appreciate the opportunity to clarify our statements.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      (2) Statistics for the microscopy images, on which most conclusions in this manuscript are based, are completely missing. Given that most micrographs present one or very few cells, together with the fact that almost all conclusions depend on whether certain macromolecules are at one or two poles and whether different complexes are in the same pole, proper statistics, based on hundreds of cells in several fields, are absolutely required. Without this information, the results are anecdotal and do not support the conclusions. Due to the importance of statistics for this manuscript, strict statistical tests should be used and reported. Moreover, representative large fields with many cells should be added as supportive information.

      We thank the reviewer for this important comment, which significantly improves the rigor and persuasiveness of our manuscript.

      For the colocalization analyses presented in Fig. 1D and Fig. 2B, we quantified 145 and 101 cells with fluorescently labeled flagella, respectively, and observed consistent colocalization of the chemoreceptor complexes and flagella in all examined cells (now added in the figure legends). Regarding the distribution patterns of chemoreceptors shown in Fig. 3A, we have now included comprehensive statistical analyses for both wild-type and mutant strains. For each strain, more than 300 cells were analyzed across at least three independent microscopic fields, providing robust statistical power (detailed data are presented in Fig. 3C).

      To further strengthen the evidence, statistical tests were applied to confirm the significance and reproducibility of our findings (Fig. 3C). In addition, representative large-field fluorescence images containing numerous cells have been added to the supplementary materials (Fig. S1 and Fig. S3).

      The problem is more pronounced when the authors make strong statements, as in lines 157-158: "The results revealed that the chemoreceptor arrays no longer grow robustly at the cell pole (Figure 2A)". Looking at the seven cells shown in Figure 2A, five of them show polar localization of the chemoreceptors. The question is then: what is the percentage of cells that show precise polar, near-polar, or mid cell localization (the three patterns shown here) in the mutant and in the wild type? Since I know that these three patterns can also be observed in WT cells, what counts is the difference, and whether it is statistically significant.

      We thank the reviewer for raising this important point. Following the reviewer's suggestion, we have now analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization. For each strain, more than 200 cells across three independent fields of view were quantified.

      Our statistical analysis shows that in the wild-type strain, approximately 98% of cells exhibit precise polar localization of the chemotaxis complex. In contrast, the ΔflhF mutant displays a clear shift in distribution, with about 5% of cells showing mid-cell localization and 9.5% showing near-polar localization. These differences demonstrate a significant alteration in the spatial pattern upon flhF deletion.

      We have revised the relevant text in lines 166-170 accordingly and included the detailed statistical data in the newly added Fig. S4.

      Even for the graphs shown in Figures 3C and 3D, where the proportion of cells with obvious chemoreceptor arrays and absolute fluorescence brightness of the chemosensory array are shown, respectively, the questions that arise are: for how many individual cells these values hold and what is the significance of the difference between each two strains?

      The number of cells analyzed for each strain is indicated in the original manuscript: 372 wild-type cells (line 123), 221 ΔflhF cells (line 172), 234 ΔfliG cells (line 197), 323 ΔfliF cells (line 200), 672 ΔflhFΔfliF cells (line 202), and 242 ΔmotAΔmotCD cells (line 207). For each strain, data were collected from three independent fields of view. We have now also provided the number of cells in Fig. 3 legend.

      We have now performed statistical comparisons using t-tests between strains. Notably, the measured values in Fig. 3C exhibit a clear, monotonic decrease with successive gene knockouts, supporting the robustness of the observed trend.

      Regarding the absolute fluorescence intensity shown in the original Fig. 3D, the mutants did not display consistent directional changes compared to the wild type. Reliable comparison of absolute fluorescence intensity requires consistent fluorescent protein maturation levels across strains. Given the likely variability in maturation levels between strains, we concluded that this data may not accurately reflect true differences in protein concentrations. Therefore, we have removed the fluorescence intensity graph from the revised manuscript to avoid potential misinterpretation.

      (3) The authors conclude that "Motor structural integrity is a prerequisite for chemoreceptor self-assembly" based on the reduction in cells with chemoreceptor clusters in mutants deleted for flagellar genes, despite the proper polar localization of the chemotaxis protein CheY. They show that the level of CheY in the WT and the mutant strains is similar, based on Western blot, which in my opinion is over-exposed. "To ascertain whether it is motor integrity rather than functionality that influences the efficiency of chemosensory array assembly", they constructed a mutant deleted for the flagella stator and found that the motor is stalled while CheY behaves like in WT cells. The authors further "quantified the proportion of cells with receptor clusters and the absolute fluorescence intensity of individual clusters (Figures 3C-D)". While Figure 3DC suggests that, indeed, the flagella mutants show fewer cells with a chemotaxis complex, Figure 3D suggests that the differences in fluorescence intensity are not statistically significant. Since it is obvious that the regulation of both structures' production and localization is codependent, I think that it takes more than a Western blot to make such a decision.

      We thank the reviewer for the suggestions. To further clarify that the assembly of flagellar motors and chemoreceptor clusters occurs in an orderly manner rather than being merely codependent, we performed additional experiments. Specifically, we constructed a ΔcheA mutant strain, in which chemoreceptor clusters fail to assemble. Using in vivo fluorescent labeling of flagellar filaments, we observed that the proportion of cells with flagellar filaments in the ΔcheA strain was comparable to that of the wild type (Fig. S5).

      In contrast, mutants lacking complete motor structures, such as ΔfliF and ΔfliG, showed a significant reduction in the proportion of cells with obvious receptor clusters (Fig. 3C). Based on these results, we conclude that the structural integrity of the flagellar motor is, to a certain extent, a prerequisite for the self-assembly of chemoreceptor clusters.

      Accordingly, we have revised the relevant statement in lines 213-217 of the manuscript to reflect this clarification.

      (4) I wonder why the authors chose to label CheY, which is the only component of the chemotaxis complex that shuttles back and forth to the base of the flagella. In any case, I think that they should strengthen their results by repeating some key experiments with labeled CheW or CheA.

      We thank the reviewer for this valuable suggestion. In our study, we initially focused on the positional relationship between chemoreceptor clusters and flagella, then investigated factors influencing cluster distribution and assembly efficiency. The physiological significance of motor and cluster co-localization was ultimately proposed with CheY as the starting point.

      Previous work by Harwood's group demonstrated that both CheY-YFP and CheA-GFP localize to the old poles of dividing Pseudomonas aeruginosa cells. Since our physiological hypothesis centers on CheY, we chose to label CheY-EYFP in our experiments.

      To further strengthen our conclusions, we constructed a plasmid expressing CheA-CFP and introduced it into the cheY-eyfp strain via electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP (Fig. S2), confirming that CheY-EYFP accurately marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 119-123) and added these data as Fig. S2.

      (5) The last section of the results is very problematic, regarding the rationale, the conclusions, and the novelty. As far as the rationale is concerned, I do not understand why the authors assume that "a spatial separation between the chemoreceptors and flagellar motors should not significantly impact the temporal comparison in bacterial chemotaxis". Is there any proof for that?

      We apologize for the lack of clarity in our original explanation. The rationale behind the statement was initially supported by comparing the timescales of CheY-P diffusion and temporal comparison in chemotaxis. Specifically, the diffusion time for CheY-P to traverse the entire length of a bacterial cell is approximately 100 ms (refs 39&40), whereas the timescale for bacterial chemotaxis temporal comparison is on the order of seconds (ref 41).

      To clarify and strengthen this argument, we have expanded the discussion as follows:

      The diffusion coefficient of CheY in bacterial cells is about 10 µm2/s, which corresponds to an estimated end-to-end diffusion time on the order of 100 ms (refs 40&41). If the chemotaxis complexes were randomly distributed rather than localized, diffusion times would be even shorter. In contrast, the timescale for the chemotaxis temporal comparison is on the order of seconds (ref. 42). Additionally, a study by Fukuoka and colleagues reported that intracellular chemotaxis signal transduction requires approximately 240 ms beyond CheY or CheY-P diffusion time (ref. 41). Moreover, the intervals of counterclockwise (CCW) and clockwise (CW) rotation of the P. aeruginosa flagellar motor under normal conditions are 1-2 seconds, as determined by tethered cell or bead assays (refs. 30&43).

      Taken together, these indicate that for P. aeruginosa, which moves via a run-reverse mode, the potential 100 ms reduction in response time due to co-localization of the chemotaxis complex and motor has a limited effect on overall chemotaxis timing.

      We have revised the corresponding text accordingly (lines 238-245) to better explain this rationale.

      More surprising for me was to read that "The signal transduction pathways in E. coli are relatively simple, and the chemotaxis response regulator CheY-P affects only the regulation of motor switching". There are degrees of complexity among signal transduction pathways in E. coli, but the chemotaxis seems to be ranked at the top. CheY is part of the adaptation. Perfect adaptation, as many other issues related to the chemotaxis pathway, which include the wide dynamic range, the robustness, the sensitivity, and the signal amplification (gain), are still largely unexplained. Hence, such assumptions are not justified.

      We apologize for the confusion and imprecision in our original statements. Our intention was to convey that the chemotaxis pathway in E. coli is relatively simple compared to the more complex chemosensory systems in P. aeruginosa. We did not mean to generalize this simplicity to all signal transduction pathways in E. coli.

      We acknowledge that E. coli chemotaxis is a highly sophisticated system, involving processes such as perfect adaptation, wide dynamic range, robustness, sensitivity, and signal amplification, many aspects of which remain incompletely understood. CheY indeed plays a crucial role in adaptation and motor switching regulation.

      Accordingly, we have revised the original text (lines 249-255) to avoid any misunderstanding.

      More perplexing is the novelty of the authors' documentation of the effect of the chemotaxis proteins on the c-di-GMP level. In 2013, Kulasekara et al. published a paper in eLife entitled "c-di-GMP heterogeneity is generated by the chemotaxis machinery to regulate flagellar motility". In the same year, Kulasekara published a paper entitled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa". The authors did not cite these works and I wonder why.

      We apologize for having been unaware of these important references and thank the reviewer for bringing them to our attention. We have now cited the eLife paper and the PhD thesis titled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa" by Kulasekara et al.

      Regarding novelty, there are key differences between our findings and those reported by Kulasekara et al. While they proposed that CheA influences c-di-GMP heterogeneity through interaction with a specific phosphodiesterase (PDE), our results demonstrate that overexpression of CheY leads to an increase in intracellular c-di-GMP levels.

      We have revised the original text accordingly (lines 358-362) to clarify these distinctions.

      (6) Throughout the manuscript, the authors refer to foci of fluorescent CheY as "chemoreceptor arrays". If anything, these foci signify the chemotaxis complex, not the membrane-traversing chemoreceptors.

      We thank the reviewer for this clarification. We have revised the manuscript accordingly to refer to the fluorescent CheY foci as representing the chemotaxis complex rather than the chemoreceptor arrays.

      Conclusions:

      The manuscript addresses an interesting subject and contains interesting, but incomplete, data.

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. FlhF is known to cause the motor to localize to the pole in a different bacterial species, Vibrio cholera, but it is not involved in receptor localization in that bacterium. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data look to be high-quality.

      Weaknesses:

      However, the interpretations and conclusions drawn from the experimental observations are not fully justified in my opinion.

      I see two main issues with the evidence provided for the authors' claims.

      (1) Assumptions about receptor localization:

      The authors rely on YFP-tagged CheY to identify the location of the receptor cluster, but CheY is a diffusible cytoplasmic protein. In E. coli, CheY has been shown to localize at the receptor cluster, but the evidence for this in PA is less strong. The authors refer to a paper by Guvener et al 2006, which showed that CheY localizes to a cell pole, and CheA (a receptor cluster protein) also localizes to a pole, but my understanding is that colocalization of CheY and CheA was not shown. My concern is that CheY could instead localize to the motor in PA, say by binding FliM. This "null model" would explain the authors' observations, without colocalization of the receptors and motor. Verifying that CheY and CheA are colocalized in PA would be a very helpful experiment to address this weakness.

      We thank the reviewer for this valuable suggestion. We agree that verifying the colocalization of CheY and CheA would strengthen our conclusions. To address this, we constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex rather than the flagellar motor.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      (2) Argument for the functional importance of receptor-motor colocalization at the pole:

      The authors argue that colocalization of the receptors and motors at the pole is important because it could keep phosphorylated CheY, CheY-p, restricted to a small region of the cell, preventing crosstalk with other signaling pathways. Their evidence for this is that overexpressing CheY leads to higher intracellular cdG levels and cell aggregation. Say that the receptors and motors are colocalized at the pole. In E. coli, CheY-p rapidly diffuses through the cell. What would prevent this from occurring in PA, even with colocalization?

      We appreciate the reviewer's insightful question. The colocalization of both the signaling source (the kinase) and sink (the phosphatase) at the chemoreceptor complex at the cell pole results in a rapid decay of CheY-P concentration within approximately 0.2 µm from the cell pole, leading to a nearly uniform distribution elsewhere in the cell, as demonstrated by Vaknin and Berg (ref. 46). This spatial arrangement effectively confines high CheY-P levels to the pole region. When the motor is also localized at the cell pole, this reduces the need for elevated CheY-P concentrations throughout the cytoplasm, thereby minimizing potential crosstalk with other signaling pathways.

      We have revised the manuscript accordingly (lines 280-286) to clarify this point.

      Elevating CheY concentration may increase the concentration of CheY-p in the cell, but might also stress the cells in other unexpected ways. It is not so clear from this experiment that elevated CheY-p throughout the cell is the reason that they aggregate, or that this outcome is avoided by colocalizing the receptors and motor at the same pole. If localization of the receptor array and motor at one pole were important for keeping CheY-p levels low at the opposite pole, then we should expect cells in which the receptors and motor are not at the pole to have higher CheY-p at the opposite pole. According to the authors' argument, it seems like this should cause elevated cdG levels and aggregation in the delta flhF mutants with wild-type levels of CheY. But it does not look like this happened. Instead of varying CheY expression, the authors could test their hypothesis that receptor-motor colocalization at the pole is important for preventing crosstalk by measuring cdG levels in the flhF mutant, in which the motor (and maybe the receptor cluster) are no longer localized in the cell pole.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using ΔflhF mutants.

      First, as noted above, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in <i.E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild-type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness in this paper is that the authors never discussed how the flagellar gene expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? How many classes are there for these genes? Is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the insightful comments. P. aeruginosa possesses a four-tiered transcriptional regulatory hierarchy controlling flagellar biogenesis. Within this system, fliF and fliG belong to class II genes and are regulated by the master regulator FleQ. In contrast, chemotaxis-related genes such as cheA and cheW are regulated by intracellular free FliA, and currently, there is no evidence that FliA activity is influenced by proteins like FliG.

      To verify that the expression of core chemotaxis proteins was not affected by deletion of fliG, we performed Western blot analyses to compare CheY levels in wild-type, ΔfliF, and ΔfliG strains. We observed no significant differences, indicating that the reduced presence of receptor clusters in these mutants is not due to altered expression of chemotaxis proteins.

      Accordingly, we have revised the manuscript (lines 341-348) and updated Fig. 3B to reflect these findings.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers comment on several important aspects that should be addressed, namely: the lack of statistical analysis; the need for clarifications regarding assumptions made regarding receptor localization; the functional importance of receptor-motor colocalization; and the need for an elaborate discussion of flagellar gene expression. Also, two reviewers pointed out the need to prove the co-localization of CheY and CheA; This is important since CheY is dynamic, shuttling back and forth from the chemotaxis complex to the base of the flagella, whereas CheA (or cheW or, even better, the receptors) is considered less dynamic and an integral part of the chemotaxis complex.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      Line 43: "ubiquitous" - I would choose another word.

      We changed "ubiquitous" to "widespread".

      Line 49: "order" - change to organize.

      We changed "order" to "organize".

      Line 52: "To grow and colonize within the host, bacteria have evolved a mechanism for migrating...". Motility "towards more favorable environments" is an important survival strategy of bacteria in various ecological niches, not only within the host.

      We revised it to "grow and colonize in various ecological niches".

      Line 72: Define F6 in "F6 pathway-related receptors".

      The proteins encoded by chemotaxis-related genes collectively constitute the F6 pathway, which we have now explained in the manuscript text.

      Line 72-73: Do references 17 &18 really report colocalization of the chemotaxis receptor and flagella to the same pole? If these or other reports document such colocalization, then the sentence in the Abstract "Surprisingly, we found that both are located at the same cell pole..." is not correct.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      Line 108: "CheY has been shown to colocalize with chemoreceptors". The authors rely here (reference 29) and in other places on findings in E. coli. However, in the Introduction, they describe the many differences between the motility systems of P. aeruginosa and E. coli, e.g., the number of chemosensory systems and their spatial distribution (E. coli is a peritrichous bacterium, as opposed to the monotrichous bacterium P. aeruginosa). There seem to be proofs for colocalization of the Che and MCP proteins in P. aeruginosa, which should be cited here.

      Thank you for pointing this out. Harwood's group reported that a cheY-YFP fusion strain exhibited bright fluorescent spots at the cell pole, which disappeared upon knockout of cheA or cheW-genes encoding structural proteins of the chemotaxis complex. This strongly suggests colocalization of CheY with MCP proteins in P. aeruginosa. We have now cited this study as reference 17 in the manuscript.

      Figure 1B: Please replace the order of the schematic presentations, so that the cheY-egfp fusion, which is described first in the text, is at the top.

      We have modified the order of related images in Fig. 1B.

      Line 127: "by introducing cysteine mutations". Replace either by "by introducing cysteines" or by "by substituting several residues with cysteines".

      We changed the relevant statement to "by introducing cysteines".

      Line 144-145: "Given that the physiological and physical environments of both cell poles are nearly identical.". I think that also the physical, but certainly the physiological environment of the two poles is not identical. First, one is an old pole, and the other a new pole. Second, many proteins and RNAs were detected mainly or only in one of the poles of rod-shaped Gram-negative bacteria that are regarded as symmetrically dividing. Although my intuition is that the authors are correct in assuming that "it is unlikely that the unipolar distribution of the chemoreceptor array can be attributed to passive regulatory factors", relating it to the (false) identity between the poles is incorrect.

      We thank the reviewer for this important correction. We agree that the physiological environments of the two poles are not identical, given that one is the old pole and the other the new pole, and that many proteins and RNAs show polar localization in rod-shaped Gram-negative bacteria. Accordingly, we have revised the original text (lines 150-152) to read:

      “Despite potential differences in the physical and especially physiological environments at the two cell poles, it is unlikely that the unipolar distribution of the chemotaxis complex can be attributed to passive regulatory factors.”

      Lines 151-154: "Considering the consistent colocalization pattern between chemosensory arrays and flagellar motors in P. aeruginosa". Does the word consistent relate to different reports on such colocalization or to the results in Figure 1D? In case it is the latter, then what is the word consistent based on? All together only 7 cells are presented in the 5 micrographs that compose Figure 1D (back to statistics...).

      We thank the reviewer for raising this point. To clarify, the word "consistent" refers to the observation of colocalization shown in Figure 1D & Figure S3. As noted in the revised figure legend for Figure 1D, a total of 145 cells with labeled flagella were analyzed, all exhibiting consistent colocalization between flagella and chemosensory arrays. Additionally, we have included a new image showing a large field of co-localization in the wild-type strain as Figure S3 to better illustrate this consistency.

      Figure 2A: Omit "Subcellular localization of" from the beginning of the caption.

      We removed the relevant expression from the caption.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend checking that CheY localizes to the receptor cluster in PA. This could be done by tagging cheA with a different fluorophore and demonstrating their colocalization. It would also be helpful to check that they are colocalized in the delta flhF mutant.

      We thank the reviewer for this valuable suggestion. We constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      The experiments under- and over-expressing CheY part seemed too unrelated to receptor-motor colocalization. I think the authors should think about a more direct way of testing whether colocalization of the motor and receptors is important for preventing signaling crosstalk. One way would be to measure cdG levels in WT and in delta flhF mutants and see if there is a significant difference.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using flhF mutants.

      First, as noted in the response to your 2nd comment in Public Review, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Recommendations For The Authors):

      (1) Can the authors elaborate more on the hierarchy of flagellar gene expression in P. aeruginosa and how this relates to their work?

      We thank the reviewer for the suggestion. We have now described the hierarchy of flagellar gene expression in P. aeruginosa in lines 341-348.

      (2) I would suggest that the authors check other flagellar mutants (than FliF and FliG) where the motor is partially assembled (e.g., any of the rod proteins or the P-ring protein), together with FlhF mutant, to see how a partially assembled motor affects the assembly of the chemosensory cluster.

      We thank the reviewer for this valuable suggestion. The P ring, primarily composed of FlgI, acts as a bushing for the peptidoglycan layer, and its absence leads to partial motor assembly. We constructed a ΔflgI mutant and observed that the proportion of cells exhibiting distinct chemotactic complexes was similar to that of the wild-type strain, suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring. We have revised the original text accordingly (lines 217-220) and added the corresponding data as Figure S6.

      (3) I would suggest that the authors check the levels of CheY in cells induced with different concentrations of arabinose (i.e., using western blotting just like they did in Figure 3B).

      We have assessed the levels of CheY in cells induced with different concentrations of arabinose using western blotting, as suggested. The results have been incorporated into the manuscript (lines 274-275) and are presented in Figure S7.

      (4) To my eyes, most of the foci in FliF-FlhF mutant in Figure 3A are located at the pole (which is unlike the FlhF mutant in Figure 2). Is this correct? I would suggest that the authors also investigate this to see where the chemosensory cluster is located.

      We thank the reviewer for pointing this out. The distribution of the chemotaxis complex in the ΔflhFΔfliF strain was investigated and showed in Fig. S4. Indeed, most of the chemoreceptor foci in this mutant are located at the pole. This probably suggests that, in the absence of both FlhF and an assembled motor, the position of the receptor complex may be largely influenced by passive factors such as membrane curvature. This interesting possibility warrants further investigation in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this work, the authors recorded the dynamics of the 5-HT with fiber photometry from CA1 in one hemisphere and LFP from CA1 in the other hemisphere. They observed an ultra-slow oscillation in the 5-HT signal during both wake fulness and NREM sleep. The authors have studied different phases of the ultra-slow oscillation to examine the potential difference in the occurrence of some behavioral state-related physiological phenomena hippocampal ripples, EMG, and inter-area coherence).

      Strengths

      The relation between the falling/rising phase of the ultra-slow oscillation and the ripples is sufficiently shown. There are some minor concerns about the observed relations that should be addressed with some further analysis.

      Systematic observations have started to establish a strong relation between the dynamics of neural activity across the brain and measures of behavioral arousal. Such relations span a wide range of temporal scales that are heavily inter-related. Ultra-slow time-scales are specifically under-studied due to technical limitations and neuromodulatory systems are the strongest mechanistic candidates for controlling/modulating the neural dynamics at these time-scales. The hypothesis of the relation between a specific time-scale and one certain neuromodulator (5-HT in this manuscript) could have a significant impact on the understanding of the hierarchy in the temporal scales of neural activity.

      Weaknesses:

      One major caveat of the study is that different neuromodulators are strongly correlated across all time scales and related to this, the authors need to discuss this point further and provide more evidence from the literature (if any) that suggests similar ultra-slow oscillations are weaker or lack from similar signals recorded for other neuromodulators such as Ach and NA.

      The reviewer is correct to point out that the levels of different neuromodulators are often correlated. For example, most monoaminergic neurons, including serotonergic neurons of the raphe nuclei, show similar firing rates across behavioral states, firing most during wake behavior, less during NREM, and ceasing firing during ‘paradoxical sleep’ or REM (Eban-Rothschild et al 2018). Notably, other neuromodulators, such as acetylcholine (ACh), show the opposite pattern across states, with highest levels observed during REM, an intermediate level during wake behavior, and the lowest level during NREM (Vazquez et al. 2001). Despite these differences, ultraslow oscillations of both monoaminergic and non-monoaminergic neuromodulators, have been described, albeit only during NREM sleep (Zhang et al. 2021, Zhang et al. 2024, Osorio-Ferero et al. 2021, Kjaerby et al. 2022). How ultraslow oscillations of different neuromodulators are related has been only recently explored (Zhang et al. 2024). In this study, dual recording of oxytocin (Oxt) and ACh with GRAB sensors showed that the levels of the two neuromodulators were indeed correlated at ultraslow frequencies with a 2 s temporal shift. Furthermore, this shift could be explained by a hippocampal-to-lateral septum intermediate pathway, in which the level of ACh causally impacts hippocampal activity, which then in turn controls Oxt levels. Given the known temporal relationship between ripples, ACh and Oxt, and now with our work, between ripples and 5-HT, one could infer the relative timing of ultraslow oscillations of ACh, Oxt and 5-HT. While dual recordings of norepinephrine (NE) and 5-HT have not been performed, a similar correlation with temporal shift could be hypothesized given the parallel relationships between NE and spindles (OsorioFerero et al. 2021), and 5-HT and ripples, with the known temporal delay between ripples and spindles (Staresina et al. 2023). The fact that the locus coerulus receives particularly dense projections from the dorsal raphe nucleus (Kim et al. 2004) further suggests that 5-HT ultraslow oscillations could drive NE oscillations. How exactly ultraslow oscillations of serotonin are related to ultraslow oscillations of different neuromodulators in different brain regions remains to be studied.

      We have further addressed this question and how it relates to the issue of causality in the Discussion section of the manuscript (p. 13):

      “In addition to the difficulties involved with typical causal interventions already mentioned, the fact that the levels of different neuromodulators are interrelated and affected by ongoing brain activity makes it very hard to pinpoint ultraslow oscillations of one specific neuromodulator as controlling specific activity patterns, such as ripple timing. While a recent paper purported to show a causative effect of norepinephrine levels on ultraslow oscillations of sigma band power, the fact that optogenetic inhibition of locus coerulus (LC) cells, but also excitation, only caused a minor reduction of the ultraslow sigma power oscillation suggests that other factors also contribute (Osorio-Forero et al., 2021). Generally, it is thought that many neuromodulators together determine brain states in a combinatorial manner, and it is probable that the 5-HT oscillations we measure, like the similar oscillations in NE, are one factor among many.

      Nevertheless, given the known effects of 5-HT on neurons, it is not unlikely that the 5-HT fluctuations we describe have some impact on the timing of ripples, MAs, hippocampal-cortical coherence, or EMG signals that correlate with either the rising or descending phase. In fact, causal effects of 5-HT on ripple incidence (Wang et al. 2015, ul Haq et al. 2016 and Shiozaki et al. 2023), MA frequency (Thomas et al. 2022), sensory gating (Lee et al. 2020), which is subserved by inter-areal coherence (Fisher et al. 2020), and movement (Takahashi et al. 2000, Alvarez et al. 2022, Jacobs et al. 1991 and Luchetti et al. 2020) have all been shown. Our added findings that serotonin affects ripple incidence in hippocampal slices in a dose-dependent manner (Figure S1) further suggests that the relationship between ultraslow 5-HT oscillations and ripples we report may indeed result, at least in part, from a direct effect of serotonin on the hippocampal network.

      Whether these ‘causal’ relationships between 5-HT and the different activity measures we describe can be used to support a causal link between ultraslow 5-HT oscillations and the correlated activity we report remains an open question. To that point, some studies have described changes in ultraslow oscillations due to manipulation of serotonin signaling. Specifically, reduction of 5-HT1a receptors in the dentate gyrus was recently shown to reduce the power of ultraslow oscillations of calcium activity in the same region (Turi et al. 2024). Furthermore, psilocin, which largely acts on the 5-HT2a receptor, decreased NREM episode length from around 100 s to around 60 s, and increased the frequency of brief awakenings (Thomas et al. 2022). While ultraslow oscillations were not explicitly measured in this study, the change in the rhythmic pattern of NREM sleep episodes and brief awakenings, or microarousals, suggests an effect of psilocin on ultraslow oscillations during NREM. Although these studies do not necessarily point to an exclusive role for 5-HT in controlling ultraslow oscillations of different brain activity patterns, they show that changes in 5-HT can contribute to changes in brain activity at ultraslow frequencies.”

      A major question that has been left out from the study and discussion is how the same level of serotonin before and after the peak could be differentially related to the opposite observed phenomenon. What are the possible parallel mechanisms for distinguishing between the rising and falling phases? Any neurophysiological evidence for sensing the direction of change in serotonin concentration (or any other neuromodulator), and is there any physiological functionality for such mechanisms?

      We have added a paragraph in the discussion to address how this differentiation of the 5-HT signal may be carried out (Discussion, paragraph #3, p. 10):

      “In order for the ultraslow oscillation phase to segregate brain activity, as we have observed, the hippocampal network must somehow be able to sense the direction of change of serotonin levels. While single-cell mechanisms related to membrane potential dynamics are typically too fast to explain this calculation, a theoretical work has suggested that feedback circuits can enable such temporal differentiation, also on the slower timescales we observe (Tripp and Eliasmith, 2010). Beyond the direction of change in serotonin levels, temporal differentiation could also enable the hippocampal network to discern the steeper rising slope versus the flatter descending slope that we observe in the ultraslow 5-HT oscillations (Figure S2), which may also be functionally relevant (Cole and Voytek, 2017). The distinction between the rising and falling phase of ultraslow oscillations is furthermore clearly discernible at the level of unit responses, with many units showing preferences for either half of the ultraslow period (Figure S6). Another factor that could help distinguish the rising from the falling phase is the level of other neuromodulators, as it is likely the combination of many neuromodulators at any given time that defines a behavioral substate. Given the finding that ACh and Oxt exhibit ultraslow oscillations with a temporal shift (Zhang et al. 2024), one could posit that distinct combinations of different levels of neuromodulators could segregate the rising from the falling phase via differential effects of the combination of neuromodulators on the hippocampal network.”

      Functionally, the ability to distinguish between the rising and falling phases of an oscillatory cycle is a form of phase coding. A well-known example of this can be seen in hippocampal place cells, which fire relative to the ongoing theta oscillations. The key advantage of phase coding is that it introduces an additional dimension, i.e. phase of firing, beyond the simple rate of neural firing. This allows for the multiplexing of information (Panzeri et al., 2010), enabling the brain to encode more complex patterns of activity. Moreover, phase coding is metabolically more efficient than traditional spike-rate coding (Fries et al., 2007).

      Reviewer #2 (Public review):

      Summary:

      In their study, Cooper et al. investigated the spontaneous fluctuations in extracellular 5-HT release in the CA1 region of the hippocampus using GRAB5-HT3.0. Their findings revealed the presence of ultralow frequency (less than 0.05 Hz) oscillations in 5-HT levels during both NREM sleep and wakefulness. The phase of these 5-HT oscillations was found to be related to the timing of hippocampal ripples, microarousals, electromyogram (EMG) activity, and hippocampal-cortical coherence. In particular, ripples were observed to occur with greater frequency during the descending phase of 5-HT oscillations, and stronger ripples were noted to occur in proximity to the 5-HT peak during NREM. Microarousal and EMG peaks occurred with greater frequency during the ascending phase of 5-HT oscillations. Additionally, the strongest coherence between the hippocampus and cortex was observed during the ascending phase of 5-HT oscillations. These patterns were observed in both NREM sleep and the awake state, with a greater prevalence in NREM. The authors posit that 5-HT oscillations may temporally segregate internal processing (e.g., memory consolidation) and responsiveness to external stimuli in the brain.

      Strengths:

      The findings of this research are novel and intriguing. Slow brain oscillations lasting tens of seconds have been suggested to exist, but to my knowledge they have never been analyzed in such a clear way. Furthermore, although it is likely that ultra-slow neuromodulator oscillations exist, this is the first report of such oscillations, and the greatest strength of this study is that it has clarified this phenomenon both statistically and phenomenologically.

      Weaknesses:

      As with any paper, this one has some limitations. While there is no particular need to pursue them, I will describe ten of them below, including future directions:

      (1) Contralateral recordings: 5-HT levels and electrophysiological recordings were obtained from opposite hemispheres due to technical limitations. Ipsilateral simultaneous recordings may show more direct relationships.

      Although we argue that bilateral symmetry defines both the serotonin system and many hippocampal activity patterns (Methods: Dual fiber photometry and silicon probe recordings), we agree that ipsilateral recordings would be superior to describe the link between serotonin and electrophysiology in the hippocampus. In addition to noting that a recent study has adopted the same contralateral design (Zhang et al. 2024), we add a reference further supporting bilateral hippocampal synchrony, specifically of dentate spikes (Farrell et al. 2024). However, as functional lateralization has been recently proposed to underlie certain hippocampal functions in the rodent (Jordan 2020), future studies should ideally include both imaging and electrophysiology in a single hemisphere to guarantee local correlations rather than assuming inter-hemispheric synchrony. This could be accomplished using an integrated probe with attached optical fibers, as described in Markowitz et al. 2018, which is however technically more challenging and has, to our knowledge, not yet been implemented with fiber photometry recordings with GRAB sensors. Given the required separation of a few hundred micrometers between the probe shanks and the optical fiber cannula, it is important to consider whether the recordings are capturing the same neuronal populations. For example, there is a risk of recording electrical activity from dorsal hippocampal neurons while simultaneously measuring light signals from neurons in the intermediate hippocampus, which are functionally distinct populations (Fanselow and Dong 2009).

      (2) Sample size: The number of mice used in the experiments is relatively small (n=6). Validation with a larger sample size would be desirable.

      While larger sample sizes generally reduce the influence of random variability and minimize the impact of outliers on conclusions, our use of mixed-effects models mitigates these concerns by accounting for both inter-session and inter-mouse variability. With this approach, we explicitly model random effects, such as the variability between individual mice and sessions, alongside fixed effects (such as treatment), which ensures that our results are not driven by random fluctuations in a few individual mice or sessions. Furthermore, the inclusion of random intercepts and slopes in the models allows for the possibility that different animals and/or sessions have different baseline characteristics and respond to different degrees of magnitude to the treatment. In summary, while validating these findings with a larger sample size would certainly help detect more subtle effects, we are confident in the robustness of the conclusions presented.

      (3) Lack of causality: The observed associations show correlations, not direct causal relationships, between 5-HT oscillations and neural activity patterns.

      We agree that the data we present in this study is largely correlational and generally avoid claims of causality in the manuscript. In the Discussion section, we discuss barriers to interpreting typical causal interventions in vivo, such as optogenetic activation of raphe nuclei: “The two previously mentioned in vivo studies showing reduced ripple incidence…”(paragraph #10, pg. 12), as well as an added section on further causality considerations in the Discussion section of the manuscript (paragraph #12, pg. 13): “In addition to the difficulties involved with…”

      Due to these barriers, as a first step, we wanted to describe how physiological changes in serotonin levels are correlated to changes in the hippocampal activity. Equipped with a deeper understanding of physiological serotonin dynamics, future studies could explore interventions that modulate serotonin in keeping with the natural range of serotonin fluctuations for a given state. On that point, another challenge which we have not mentioned in the manuscript is that modulating serotonin, or any neuromodulator’s levels, has the potential, depending on the degree of modulation, to transition the brain to an entirely different behavioral state. This then complicates interpretation, as one is not sure whether effects observed are due to the changes in the neuromodulator itself, or secondary to changes in state. At the same time, 5-HT activity drives networks which in return can change the release of other neurotransmitters, leading to indirect effects.

      The results of our in vitro experiments suggest that a causal relationship between serotonin and ripples is possible (Figure S1). Though the hippocampal slice preparation is clearly an artificial model, it provides a controlled environment to isolate the effects of serotonin manipulation on the hippocampal formation, without the confounding influence of systemic 5-HT fluctuations in other brain regions. Notably, the dose-dependent effects of serotonin (5-HT) wash-in on ripple incidence observed in vitro closely mirror the inverted-U dose-response curve seen in our in vivo experiments across states, where small increases in serotonin lead to the highest ripple incidence, and both lower and higher levels correspond to reduced ripple activity. This parallel suggests that the gradual washing of serotonin in our in vitro system may mimic the tonic firing changes in serotonergic neurons that occur during state transitions in vivo. These findings underscore the importance of studying how different dynamics of serotonin modulation can differentially affect hippocampal network activity.

      (4) Limited behavioral states: The study focuses primarily on sleep and quiet wakefulness. Investigation of 5-HT oscillations during a wider range of behavioral states (e.g., exploratory behavior, learning tasks) may provide a more complete understanding.

      We agree that future studies should investigate a broader range of behavioral states. For this study, as we were focused on general sleep and wake patterns, our recordings were done in the home cage, and we limited ourselves to the basic behavioral states described in the paper. Future studies should be designed to investigate ultraslow 5-HT oscillations during different behaviors, such as continuous treadmill running. Specifically, a finer segregation of extended wake behaviors by level of arousal could greatly add to our understanding of the role of ultraslow serotonin oscillations.

      (5) Generalizability to other brain regions: The study focuses on the CA1 region of the hippocampus. It's unclear whether similar 5-HT oscillation patterns exist in other brain regions.

      Given the reported ultraslow oscillations of population activity in serotonergic neurons of the dorsal raphe nucleus (Kato et al. 2022) as well as the widespread projections of the serotonergic nuclei, we would expect a broad expression of ultraslow 5-HT oscillations throughout the brain. So far, ultraslow 5-HT oscillations have been described in the basal forebrain, as well as in the dentate gyrus, in addition to what we have shown in CA1 (Deng et al. 2024 and Turi et al. 2024). Furthermore, our results showing that hippocampal-cortical coherence changes according to the phase of hippocampal ultraslow 5-HT oscillations suggests that 5-HT can affect oscillatory activity either indirectly by modulating hippocampal cells projecting to the cortical network or directly by modulating the cortical postsynaptic targets. Given the heterogeneity in projection strength, as well as in pre- and postsynaptic serotonin receptor densities across brain regions (de Filippo & Schmitz, 2024), it would be interesting to see whether local ultraslow 5-HT oscillations are differentially modulated, e.g. in terms of oscillation power. Future studies investigating different brain regions via implantation of multiple optic fibers in different brain areas or using the mesoscopic imaging approach adopted in Deng et al. 2024, will be needed to examine the extent of spatial heterogeneity in this ultraslow oscillation.

      (6) Long-term effects not assessed: Long-term effects of ultra-low 5-HT oscillations (e.g., on memory consolidation or learning) were not assessed.

      While beyond the scope of our current study, we agree that an important next step would involve modulating the ultraslow serotonin oscillation after learning, and then examining potential effects on memory consolidation, presumably via changes in ripple dynamics, though many possibilities could explain potential effects. There, our results suggest it would be important to isolate effects due to the change in ultraslow oscillation features, rather than simply overall levels of 5-HT. To that end, it would be important to test different modulation dynamics, specifically modulating the oscillation strength, around a constant mean 5-HT level by carefully timed optogenetic stimulation/inhibition. Afterwards, showing a clear correlation between the strength of the 5-HT modulation and memory performance would be important to establishing the relationship, as done in Lecci et al 2017, where more prominent ultraslow oscillations of sigma power in the cortex during sleep, alongside a higher density of spindles, were correlated with better memory consolidation. Given the tight coupling of spindles and ripples during sleep, it is possible that a similar effect on memory consolidation would be observed following changes in ultraslow 5-HT oscillation power.

      (7) Possible species differences: It's uncertain whether the findings in mice apply to other mammals, including humans.

      We agree that the experiments should ultimately be replicated in humans. In the 2017 study by Lecci et al., the authors highlighted the shared functional requirements for sleep across species, despite apparent differences, such as variations in sleep volume. To explore these commonalities, the researchers conducted parallel experiments in both mice and humans, aiming to identify a universal organizing structure. They discovered that the ultraslow oscillation of sigma power serves this role, enabling both species to balance the competing demands of arousability and sleep imperviousness. Based on this finding, it is plausible that ultraslow oscillations of serotonin, which similarly modulate activity according to arousal levels, would serve a comparable function in humans.

      (8) Technical limitations: The temporal resolution and sensitivity of the GRAB5-HT3.0 sensor may not capture faster 5-HT dynamics.

      The kinetics of the GRAB5-HT3.0 sensor used in this study limit the range of serotonin dynamics we can observe. However, the ultraslow oscillations we measure reflect temporal changes on the scale of 20 s and greater, whereas the GRAB sensor we use has sub-second on kinetics and below 2 s off kinetics (Deng et al. 2024). Therefore, the sensor is capable of reporting much faster activity than the ultraslow oscillations we observe, indicating that the ultraslow 5-HT signal accurately reflects the dynamics on this time scale. Furthermore, the presence of ultraslow oscillations in spiking activity—observed in the hippocampal formation (Gonzalo Cogno et al., 2024; Aghajan et al., 2023; Penttonen et al., 1999) and in the dorsal raphe (Mlinar et al., 2016), which are not affected by the same temporal smoothing, suggests that the oscillations we record are not likely due to signal aliasing, but instead reflect genuine oscillatory activity. Of course, this does not preclude that other, faster serotonin dynamics are also present in our signal, some of which may be too fast to be observed. For instance, rapid serotonin signaling via the ionotropic 5-HT3a receptors could be missed in our recordings. Additionally, with the fiber photometry approach we adopted, we are limited to capturing spatially broad trends in serotonin levels, potentially overlooking more localized dynamics.

      (9) Interactions with other neuromodulators: The study does not explore interactions with other neuromodulators (e.g., norepinephrine, acetylcholine) or their potential ultraslow oscillations.

      We agree that the interaction between neuromodulators in the context of ultraslow oscillations is an important issue, which we have addressed in our response to reviewer #1 under ‘Weaknesses.’

      (10) Limited exploration of functional significance: While the study suggests a potential role for 5-HT oscillations in memory consolidation and arousal, direct tests of these functional implications are not included.

      We agree and reference our answer to (6) regarding memory consolidation. Regarding arousal, direct tests of arousability to different sensory stimuli during different phases of the ultraslow 5-HT oscillation during sleep would be beneficial, in addition to the indirect measures of arousal we examine in the current study, e.g. degree of movement (icEMG) and long range coherence. In line with what we have shown, Cazettes et al. (2021) has demonstrated a direct relationship between 5-HT levels and pupil size, an indicator of arousal level, which like our findings, is consistent across behavioral states.

      Reviewer #3 (Public review):

      Summary:

      The activity of serotonin (5-HT) releasing neurons as well as 5-HT levels in brain structures targeted by serotonergic axons are known to fluctuate substantially across the animal's sleep/wake cycle, with high 5-HT levels during wakefulness (WAKE), intermediate levels during non-REM sleep (NREM) and very low levels during REM sleep. Recent studies have shown that during NREM, the activity of 5HT neurons in raphe nuclei oscillates at very low frequencies (0.01 - 0.05 Hz) and this ultraslow oscillation is negatively coupled to broadband EEG power. However, how exactly this 5-HT oscillation affects neural activity in downstream structures is unclear.

      The present study addresses this gap by replicating the observation of the ultraslow oscillation in the 5-HT system, and further observing that hippocampal sharp wave-ripples (SWRs), biomarkers of offline memory processing, occur preferentially in barrages on the falling phase of the 5-HT oscillation during both wakefulness and NREM sleep. In contrast, the raising phase of the 5-HT oscillation is associated with microarousals during NREM and increased muscular activity during WAKE. Finally, the raising 5-HT phase was also found to be associated with increased synchrony between the hippocampus and neocortex. Overall, the study constitutes a valuable contribution to the field by reporting a close association between raising 5-HT and arousal, as well as between falling 5-HT and offline memory processes.

      Strengths:

      The study makes compelling use of the state-of-the-art methodology to address its aims: the genetically encoded 5-HT sensor used in the study is ideal for capturing the ultraslow 5-HT dynamics and the novel detection method for SWRs outperforms current state-of-the-art algorithms and will be useful to many scientists in the field. Explicit validation of both of these methods is a particular strength of this study.

      The analytical methods used in the article are appropriate and are convincingly applied, the use of a general linear mixed model for statistical analysis is a particularly welcome choice as it guards against pseudoreplication while preserving statistical power.

      Overall, the manuscript makes a strong case for distinct sub-states across WAKE and NREM, associated with different phases of the 5-HT oscillation.

      Weaknesses:

      All of the evidence presented in the study is correlational. While the study mostly avoids claims of causality, it would still benefit from establishing whether the 5-HT oscillation has a direct role in the modulation of SWR rate via e.g. optogenetic activation/inactivation of 5-HT axons. As it stands, the possibility that 5-HT levels and SWRs are modulated by the same upstream mechanism cannot be excluded.

      We agree that causality claims cannot be made with our data, and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One major question in the presented data is the nature of the asymmetrical shape of the targeted slow events. How much does it reflect the 5-HT concentration and how much is this shape affected by the dynamics of the designed 5-HT sensor? This needs to be addressed in more detail referencing the original paper for the used sensor.

      We have added a paragraph in the Results section of the manuscript to address the asymmetric waveform of the ultraslow 5-HT oscillations and whether it could be affected by the asymmetric kinetics of the GRAB sensor we use: “The waveform of these ultraslow 5-HT oscillations…” (Results, paragraph #4, pg. 5). We include an extended answer to the question here:

      Indeed, the GRAB5-HT3.0 sensor we use in the study shows activation response kinetics which are faster than their deactivation time, with time constants at 0.25 s and 1.39 s, respectively (Deng et al. 2024). Likewise, the slope of the rising phase of the ultraslow serotonin oscillation we measure is faster than the slope of the falling phase, and the ratio of time spent in the rising phase versus the falling phase is less than 1, indicating longer falling phases (Figure S2). Although we cannot completely rule out that the asymmetric shape of the ultraslow serotonin oscillations we record is affected by this asymmetry in the 5-HT sensor kinetics, we believe this is unlikely, as the 5-HT signal clearly contains reductions in 5-HT levels that are much faster than the descending phase of the ultraslow oscillation. Although it is difficult to directly compare the different-sized signals, the reported timescales of off kinetics, on the order of a few seconds (Deng et al. 2024), are far below the tens of seconds timescale of the ultraslow oscillation. Furthermore, the finding that some dorsal raphe neurons modulate their firing rate at ultraslow frequencies, and moreover that all examples of such ultraslow oscillations shown display clear asymmetry in rising time versus decay, suggests that the asymmetry we observe in our data could be due to neural activity rather than temporal smoothing by the sensor (Mlinar et al. 2016). In this same direction, another study found similar asymmetry in extracellular 5-HT levels measured with fast scan cyclic voltammetry (FSCV), a technique with greater temporal resolution (sampling rate of 10 Hz) than GRAB sensors, after single pulse stimulation (Bunin and Wightman 1998). In this study, 5-HT was shown to be released extrasynaptically, making the longer clearing time compared to the release time intuitive. Finally, the observation that the onsets and offsets of ripple clusters, recorded with a sampling rate of 20 kHz, are precisely aligned with the peaks and troughs of ultraslow serotonin oscillations (Figure 1, H1-2, columns 2-3) suggests that the duration of the falling phase is not artificially distorted by the temporal smoothing of the sensor dynamics.

      Regardless of the dynamics of the serotonin concentration, it should be noted that the elicited neuronal effect might have different dynamics compared to the 5-HT concentration that need to be more studied: to address this one can either examine the average of the broadband LFP (not high passfiltered by the amplifier) or the distribution of simultaneously recorded spiking activity around the peak of ultra-slow oscillations.

      We have added Figure S6, showing unit activity relative to the phase of ultraslow serotonin oscillations.

      From this analysis, we uncover three groups of units which are largely preserved across states (Figure S6, E vs. F), albeit with a slight temporal shift rightward from NREM to WAKE (Figure S6, C vs. D). Namely, some units spike preferentially during the rising phase, some during the falling phase, and a third group have no clear phase preference. Unit activity during the falling phase is unsurprising, as it is where ripples largely occur, which themselves are associated with spike bursts. During the rising phase, the unit activity we observe could correspond to firing of the hippocampal subpopulation known to be active during NREM interruption states (Jarosiewicz et al. 2002, Miyawaki et al. 2017). While the units’ phase preference was tested based on the category of rising vs. falling phase, as this division described most variation in the data, a few units in the ‘No preference’ group showed heightened activity near the oscillation peak. However, given the very small number of units with this preference, more unit data is needed to describe this group, ideally with high-density recordings. Overall, most units showed a falling vs. rising phase preference, indicating a phase coding of hippocampal activity by 5-HT ultraslow oscillations.

      Related to the previous point, it would be helpful to show the average cycle shape of these oscillations (relative to the phase 0 extracted in Figure 3) and do the shape comparison across sessions and also wake/NREM

      We agree, and to this end we have added Figure S2. From this waveform analysis, we show that the ultraslow serotonin oscillation is asymmetric, with the rising phase having a greater slope, but shorter length, than the falling phase. While this asymmetry is observed both in NREM and WAKE, the slope difference and length ratio difference in rising vs. falling phase is greater in NREM (Figure S2. B).

      In Figure 3D, there seem to be oscillatory rhythms with faster cycles on top of the targeted oscillations. That would make the phase estimation less accurate, e.g. in the left panel, in the second cycle, it is not clear if there are two faster cycles or it is one slow cycle as targeted, and if noted in the rising phase of the second fast cycle there are no ripples. This might suggest that regardless of specific oscillation frequency whenever 5-HT is started to get released, the ripples are suppressed and once the 5-HT is not synaptically effective anymore the ripples start to get generated while the photometry signal starts to wane with the serotonin being cleared. Still, if there is any rhythmicity between bouts of no ripple, it would suggest an ultra-slow regularity in the 5-HT release.

      The reviewer is correct to point out that some faster increases in serotonin, which occur on top of the ultraslow oscillations we measure, seem to be associated with decreased ripple incidence, as in the example referenced. The dominance of ultraslow frequencies in the power spectrum of the 5-HT signal suggests, however, that oscillations faster than the ultraslow oscillations we describe are far less prevalent in the data. While there may be some coupling of ripples and other measures to serotonin oscillations of different frequencies, this may be hard or impossible to detect with phase analysis based on their infrequent occurrence and nonstationary nature. In fact, we show in Figure S3 that the strongest phase modulation of ripples by ultraslow serotonin oscillations is observed in the frequencies we use (0.01-0.06 Hz). Methodologically, phase analysis indeed assumes stationary signals, which are rare if not absent in physiological data (Lo et al. 2009), however generally the narrower the frequency band, the better the phase estimation. The narrow frequency band we use provides phase estimates that are largely robust and unaffected by the presence of faster oscillations, as can be seen in the example phase traces shown in Figure 4.

      The hypothesis that the rising phase burst of synaptic serotonin is what silences ripples, and that with the clearing of serotonin from the synapses, ripples recover, is a possible explanation of our findings. However, if this were the case, one could expect the ripple rate to increase over the course of the falling phase of ultraslow 5-HT oscillations, as 5-HT decreases, and peak at the trough. This is at odds with what we observe, namely a fairly uniform distribution of ripples along the falling phase (Figure 3F2,F4). Furthermore, the Mlinar et al. 2016 study describes a subpopulation of raphe neurons whose firing rates themselves oscillate at ultraslow frequencies, rather than on-off bursting at ultraslow frequencies, which would argue against this hypothesis. However, as this study looks at a small number of neurons in slices, further in vivo experiments examining firing rates of median raphe neurons are required to understand how the ultraslow oscillation of extracellular serotonin that we measure is generated as well as how it is related to ripple rates.

      In Figure 3B, it is not clear why IRI is z-scored. It would be informative to have the actual value of IRI. What is the z relative to? Is it the mean value of IRI in each recording session? Is this to reduce the variability across sessions?

      We have now included in Figure 3D a box plot displaying the IRI distributions across different states and sessions. To minimize inter-session variability, data were z-scored within each session for visualization purposes. However, all general linear models were based on raw data, and as a result, the raw differences in IRI are shown in Figure 3C.

      Figure 3E, panel labels don't match with the caption

      We are grateful to the reviewer for pointing out this mistake, which we have corrected in the updated version of the manuscript.

      In the text related to Figure 3E, the related analysis can be more clearly described. "phase preference of individual ripples" does not immediately suggest that the occurring phase of each ripple relative to the targeted oscillation is extracted. I suggest performing this analysis individually for each session and summarizing the results across the sessions.

      We have reworded the sentence in Results: 5-HT and ripples to better reflect the analysis performed: “Next, we calculated the ultraslow 5-HT phases at which individual ripples occurred during both NREM and WAKE (3E-F) ...”. Regarding session-level data, we have added Figure S3, which shows session level mean phase vectors, as well as the grand mean across sessions for both NREM and WAKE. Included in this figure are session level means for frequency bands outside of the ultraslow band we used in our study, intended to show that ripples are most strongly timed by the ultraslow band (0.01-0.06 Hz), reflected by the greater amplitude of the mean phase vector for this band.

      Figure 3E2, based on the result of ripple-triggered 5-HT in left panels of 2H1-2, one would expect to see a preferred phase closer to 180 (toward the end of the falling phase), it would be helpful to compare and discuss the results of these two analyses.

      The reviewer is correct to point out the apparent discrepancy in where the mean ripple falls with respect to the ongoing serotonin oscillation between the two figures mentioned. We have addressed this point in Results: 5-HT and ripples, paragraph #4: “This result appear to be at odds with…”.

      Regarding the analysis in 3F, please also compare the power distribution of ripples between NREM and wake. This will help to better understand the potential difference behind the observed difference: how much the strong ripples are comparable between wake and NREM. It is also necessary to report the ripple detection failure rate across ripples with different strengths.

      We have added a figure showing analysis done on a subset of the data in which ripples were manually curated in order to evaluate the performance of the ripple detection model (Figure S7) and explanatory text in Methods: Model performance: ‘To ensure that our model …’. In summary, while missed ripples did tend to have lower power than correctly detected ripples, including them did not change the distribution of ripples by the phase of the ultraslow serotonin oscillation (Figure S7C). We would also note that while the phase preference is noisier than what is presented in Figure 3F because this analysis was done with a small subset of all recorded ripples, the fact that ripples occur more clearly on the falling phase is visible for both detected ripples and detected + false negative ripples.

      The mixed-effects model examining the influence of 5-HT ultraslow oscillation phase on ripple power revealed no significant effect of state (p = 0.088). This indicates that whether the data were collected during NREM or wake periods did not significantly impact ripple power and that the lack of a significant effect (in Figure 3G,H) in WAKE is probably not due to a difference in the distribution of ripple power between states.

      4D, y label is z?

      We are grateful for the reviewer to point that out, yes, the y label should be ‘z-score’, as the two traces represent z-scored 5-HT (blue) and z-scored shuffled data (orange). Figure 4D2 and Figure 2H1-2, which show similar data, have been corrected to address this oversight.

      Relating to Figure 4, EMG comparison across phases of the oscillations is insightful. Two related and complementary analyses are to compare the theta and gamma power between the falling and rising phases.

      We have addressed this suggestion in Figure S5 A-C. While low gamma, high gamma and theta power are modulated identically in NREM, with higher power observed during the falling phase than the rising phase, during WAKE, different patterns can be seen. Specifically, low gamma power shows no phase preference, while high gamma shows a peak near the center of the ultraslow 5-HT oscillation. Theta power, as in NREM, is higher during the falling phase of ultraslow 5-HT oscillations. Increased power across many frequency bands was shown to coincide with decreases in DRN population activity during NREM, which matches with what we report here (Kato et al. 2022). In summary, while NREM patterns are consistent in all frequency bands tested, aligning with the pattern of ripple incidence, in WAKE low and high gamma power show different relationships to ultraslow 5-HT phase.

      In the manuscript, we have used the data in both Figure S5 and S6 (unit activity relative to ultraslow 5-HT oscillations), to argue against the idea that our coherence findings result from a lack of activity in the rising phase (see next question), which would have the effect of ‘artificially’ reducing coherence in the falling phase relative the rising phase. The text can be found in Results: 5-HT and hippocampal cortical coherence, paragraph #2.

      The results presented in Figure 5 could be puzzling and need to be further discussed: if the ripple band activity is weak during the rising phase, in what circumstances the coherence between cortex and CA1 is specifically very strong in this band?

      As mentioned in the previous answer, we have addressed this concern in Results: 5-HT and hippocampal-cortical coherence, paragraph #2. In summary, it is true that the higher coherence in rising phase than in the falling phase for the highest frequency band (termed ‘high frequency oscillation’ (HFO), 100-150 Hz) could be unexpected, given that ripples occur largely during the falling phase. A few points could help explain this finding. Firstly, it should be noted that power in the 100-150 Hz band can arise from physiological activity outside of ripples, such as filtered non-rhythmic spike bursts (Liu et al. 2022), whose coherent occurrence in the rising phase could explain the coherence findings. Secondly, coherence is a compound measure which is affected by both phase consistency and amplitude covariation (Srinath and Ray 2014), thus from only amplitude one cannot predict coherence. Furthermore, HFO power in the cortex is highest near the peak of ultraslow 5-HT oscillations (Figure S5D), as opposed to the falling phase peak in the hippocampus. This shows a lack of covariation in amplitude by phase between the hippocampus and cortex at this frequency band. An alternative explanation of our findings regarding coherence could be that in the rising phase, there is simply little to no activity, which is easier to ‘synchronize’ than bouts of high activity. Hippocampal unit activity in the rising phase (Figure S6) suggests however, that it is not likely to be the absence of activity supporting higher coherence in the rising phase across frequencies. Additional experiments using high density recordings should be conducted to examine 5-HT ultraslow oscillations and their role in gating activity across brain regions, though these results strongly suggest some role exists.

      Reviewer #2 (Recommendations for the authors):

      I would like to offer two comments. I believe that these are not unusual requests, and thus I would like the authors to respond.

      (1) It would be prudent to investigate the possibility that the observed correlation between ultraslow and hippocampal ripples/microarousals is merely superficial and that there are unidentified confounding factors at play. For example, it would be beneficial to provide evidence that administering a serotonin receptor inhibitor result in the disappearance of the slow oscillation of ripples and microarousals, or that the correlation with ultraslow is no longer present. Please note that the former experiments do not require GRAB5-HT3.0 imaging.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3. We would further like to note that given the large number of serotonin receptors and the lack of selectivity of many serotonin receptor antagonists, a pharmacological approach would be difficult, though the results certainly useful. Finally, we highlight the psilocin study, which reported changes in the rhythmic occurrence of microarousals, and therefore likely ultraslow oscillations, after administering a 5-HT2a receptor agonist, suggesting a potential causal effect of 5-HT (via 5-HT2a receptor) on MA occurrence (Thomas et al. 2022).

      (2) The slow frequency appears to be associated with the default mode network as observed in fMRI signals. The neural basis of the default mode network remains unclear; therefore, a more detailed examination of this possibility would be beneficial.

      We agree that it would be interesting to investigate the role of 5-HT in the neural basis of the DMN.

      The DMN as described in humans (Raichle et al. 2001) and rodents (Lu et al. 2012) may indeed include some parts of the hippocampus and perhaps some of our neocortical recordings could also be considered part of the DMN. The fact that the activity across the inter-connected brain structures of the DMN is correlated at ultraslow time scales (Gutierrez-Barragan et al. 2019, Mantini et al. 2007), as well as serotonin’s ability to modulate the DMN is intriguing (Helmbold et al. 2016). Further studies simultaneously recording DMN activity via fMRI and electrical activity via silicon probes, as done in Logothetis et al. 2001, could elucidate further a potential link between ultraslow oscillations and the DMN, with serotonergic modulation as a means to understand any potential contribution of serotonin.

      Reviewer #3 (Recommendations for the authors):

      (1) The impact of the study would benefit from an experiment causally testing the effect of hippocampal 5-HT levels on hippocampal physiology, e.g. using optogenetic manipulations.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      (2) Data presentation: the figures are of poor resolution, making some diagram details and, more importantly, some example traces (e.g. Figure 1A, right) impossible to see. This should be corrected by either increasing figure resolution or making important figure elements large enough to be readable.

      We apologize for the poor resolution and have corrected it in the updated version of the manuscript.

      (3) Differences in some figure panels are not statistically assessed: Figure 1H (differences in spectrum peak power), Figure 3E1 & Figure 3E3 (directional bias of the circular distributions), Figure 4C (difference from 0 mean).

      We acknowledge this oversight and have added statistical tests for all three figures, as well as further information regarding the models used in Methods: Statistics.

      (4) Lines 279-280: the claim that the study shows "organization of activity by ultraslow oscillations of 5-HT" implies a causal role of 5-HT in organizing hippocampal activity. I suggest that this statement be toned down to reflect the correlational nature of the presented evidence.

      We have rephrased the sentence in question to the following: “In our study, including both NREM and WAKE periods allowed us to additionally show that the temporal organization of activity relative to ultraslow 5-HT oscillations operates according to the same principles in both states...”, which we believe better reflects the temporal correlation we describe.

      (5) While the study claims to use the EMG (i.e. electromyograph) signal, it does not describe any electrodes placed inside the muscle in the methods section. The SleepScoreMaster toolbox used in the study estimates the EMG using high-frequency activity correlated across recording channels, so I assume this is how this signal was obtained. While such activity may well reflect muscular noise to some degree, it is an indirect measure as the electrodes are not in the muscle. Since the EMG signal is central to the message of the manuscript, the method for calculating it should be described in the methods section and it should be explicitly labelled as an indirect measure in the main text, e.g. by referring to this signal as pseudo-EMG.

      We agree and have added explanatory text to the State Scoring subsection in Methods. Given that the EMG we refer to is derived from intracranial data, and not from traditional EMG probes, we now refer to the EMG as intracranial EMG, or icEMG for short, throughout the main text.

      (6) Is ripple frequency or ripple duration different across the rising and falling phases of the ultraslow oscillation?

      We have now investigated this suggestion in Figure S4, where we show that ripple frequency is higher in the falling phase than rising phase, while ripple duration appears to show no phase preference.

      (7) Lines 315-317: I am not sure why the manuscript refers to the coupling between EMG and 5-HT levels as 'puzzling' given that, as stated, the locomotion-inducing effects of 5-HT are well documented. While the fact that even non-locomotory motor activity may be associated with 5-HT rise is certainly interesting (although not sure if 'puzzling'), the manuscript does not directly compare the association of 5-HT levels with locomotory and non-locomotory EMG spikes. Thus, I think this discussion point is not fully warranted.

      We agree and have rephrased the discussion point in question to reflect that the EMG link to serotonin oscillations is not necessarily surprising, given both the literature linking 5-HT and spontaneous movement in the hippocampus, as well as the involvement of 5-HT in repetitive movements, where the role for a regularly-occurring oscillation is perhaps more intuitive.

      (8) Line 441: Reference #67 does not describe the use of fiber photometry.

      The reviewer is to correct to point out this typo, which has been now corrected. The reference in question should be 64, where fiber photometry experiments are described. For further clarity, we have changed our referencing scheme to include authors and years in in-text references.

      (9) In Figures 3E1-3, the phase has different bounds than in the other Figures in the manuscript (0:360 vs -180:180), this should be corrected for consistency.

      We agree and have made changes so that all figures have a phase range of -180 to 180°.

      References

      (1) Z. M Aghajan, G. Kreiman, I. Fried, Minute-scale periodicity of neuronal firing in the human entorhinal cortex. Cell Rep 42, 113271 (2023).

      (2) M.A. Bunin, R.M. Wightman (1998). Quantitative Evaluation of 5-Hydroxytryptamine (Serotonin) Neuronal Release and Uptake: An Investigation of Extrasynaptic Transmission. J. Neurosci. 18 (13) 4854-4860

      (3) F. Cazettes, D. Reato, J. P. Morais, A. Renart, Z. F. Mainen, Phasic Activation of Dorsal Raphe Serotonergic Neurons Increases Pupil Size. Curr Biol 31, 192-197.e4 (2021).

      (4) Cole SR, Voytek B. Brain Oscillations and the Importance of Waveform Shape. Trends Cogn Sci. 21(2):137-149 (2017).

      (5) F. Deng, et al., Improved green and red GRAB sensors for monitoring spatiotemporal serotonin release in vivo. Nat Methods 21, 692–702 (2024).

      (6) C. Dong, et al., Psychedelic-inspired drug discovery using an engineered biosensor. Cell 184, 2779-2792.e18 (2021).

      (7) A. Eban-Rothschild, L. Appelbaum, L. de Lecea, Neuronal Mechanisms for Sleep/Wake Regulation and Modulatory Drive. Neuropsychopharmacol. 43, 937–952 (2018).

      (8) M. S. Fanselow, H.-W. Dong, Are the dorsal and ventral hippocampus functionally distinct structures? Neuron 65, 7–19 (2010).

      (9) J. S. Farrell, E. Hwaun, B. Dudok, I. Soltesz, Neural and behavioural state switching during hippocampal dentate spikes. Nature 1–6 (2024). https://doi.org/10.1038/s41586-024-07192-8.

      (10) De Filippo, R., & Schmitz, D. (2024). Transcriptomic mapping of the 5-HT receptor landscape. Patterns (New York, N.Y.), 5(10), 101048.

      (11) M. J. Fisher, et al., Neural mechanisms of sensory gating: Insights from human and animal studies. NeuroImage 207, 116374 (2020).

      (12) P. Fries, D. Nikolić, W. Singer, The gamma cycle. Trends in Neurosciences 30, 309–316 (2007).

      (13) S. Gonzalo Cogno, et al., Minute-scale oscillatory sequences in medial entorhinal cortex. Nature 625, 338–344 (2024).

      (14) D. Gutierrez-Barragan, M. A. Basson, S. Panzeri, A. Gozzi, Infraslow State Fluctuations Govern Spontaneous fMRI Network Dynamics. Current Biology 29, 2295-2306.e5 (2019).

      (15) K. Helmbold, et al., Serotonergic modulation of resting state default mode network connectivity in healthy women. Amino Acids 48, 1109–1120 (2016).

      (16) B. Jarosiewicz, B. L. McNaughton, W. E. Skaggs, Hippocampal Population Activity during the Small-Amplitude Irregular Activity State in the Rat. J. Neurosci. 22, 1373–1384 (2002).

      (17) J. T. Jordan, The rodent hippocampus as a bilateral structure: A review of hemispheric lateralization. Hippocampus 30, 278–292 (2020).

      (18) T. Kato, et al., Oscillatory Population-Level Activity of Dorsal Raphe Serotonergic Neurons Is Inscribed in Sleep Structure. J. Neurosci. 42, 7244–7255 (2022).

      (19) M.A. Kim, H. S. Lee, B. Y. Lee, B. D. Waterhouse, Reciprocal connections between subdivisions of the dorsal raphe and the nuclear core of the locus coeruleus in the rat. Brain Research 1026, 56–67 (2004).

      (20) C. Kjaerby, et al., Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci 25, 1059–1070 (2022).

      (21) S. Lecci, et al., Coordinated infraslow neural and cardiac oscillations mark fragility and offline periods in mammalian sleep. Sci Adv 3, e1602026 (2017).

      (22) A. A. Liu, et al., A consensus statement on detection of hippocampal sharp wave ripples and differentiation from other fast oscillations. Nat Commun 13, 6000 (2022).

      (23) M.-T. Lo, P.-H. Tsai, P.-F. Lin, C. Lin, Y. L. Hsin, The nonlinear and nonstationary properties in eeg signals: probing the complex fluctuations by hilbert–huang transform. Adv. Adapt. Data Anal. 01, 461–482 (2009).

      (24) N. K. Logothetis, J. Pauls, M. Augath, T. Trinath, A. Oeltermann, Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001).

      (25) H. Lu, et al., Rat brains also have a default mode network. Proc Natl Acad Sci U S A 109, 3979–3984 (2012).

      (26) D. Mantini, M. G. Perrucci, C. Del Gratta, G. L. Romani, M. Corbetta, Electrophysiological signatures of resting state networks in the human brain. Proc Natl Acad Sci U S A 104, 13170– 13175 (2007).

      (27) J. E. Markowitz, et al., The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44-58.e17 (2018).

      (28) H. Miyawaki, Y. N. Billeh, K. Diba, Low Activity Microstates During Sleep. Sleep 40, zsx066 (2017).

      (29) B. Mlinar, A. Montalbano, L. Piszczek, C. Gross, R. Corradetti, Firing Properties of Genetically Identified Dorsal Raphe Serotonergic Neurons in Brain Slices. Front Cell Neurosci 10, 195 (2016).

      (30) A. Osorio-Forero, et al., Noradrenergic circuit control of non-REM sleep substates. Current Biology 31, 5009-5023.e7 (2021).

      (31) S. Panzeri, N. Brunel, N. K. Logothetis, C. Kayser, Sensory neural codes using multiplexed temporal scales. Trends in Neurosciences 33, 111–120 (2010).

      (32) M. E. Raichle, et al., A default mode of brain function. Proc Natl Acad Sci U S A 98, 676–682 (2001).

      (33) R. Srinath, S. Ray, Effect of amplitude correlations on coherence in the local field potential. J Neurophysiol 112, 741–751 (2014).

      (34) B. P. Staresina, J. Niediek, V. Borger, R. Surges, F. Mormann, How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nat Neurosci 26, 1429–1437 (2023).

      (35) C. W. Thomas, et al., Psilocin acutely alters sleep-wake architecture and cortical brain activity in laboratory mice. Transl Psychiatry 12, 77 (2022).

      (36) G. F. Turi, et al., Serotonin modulates infraslow oscillation in the dentate gyrus during Non-REM sleep. eLife 13 (2025).

      (37) J. Vazquez, H. A. Baghdoyan, Basal forebrain acetylcholine release during REM sleep is significantly greater than during waking. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 280, R598–R601 (2001).

      (38) J. Wan, et al., A genetically encoded sensor for measuring serotonin dynamics. Nat Neurosci 24, 746–752 (2021).

      (39) Y. Zhang, et al., Cholinergic suppression of hippocampal sharp-wave ripples impairs working memory. Proc. Natl. Acad. Sci. U.S.A. 118, e2016432118 (2021).

      (40) Y. Zhang, et al., Interaction of acetylcholine and oxytocin neuromodulation in the hippocampus. Neuron (2024).

    1. One play would probably seldom occupy more than an hour and a half; but often three plays were connected together in one grand whole called a trilogy, somewhat as the several parts of Shakespeare's historical plays are connected; and these were followed by a comic piece by the same poet, which might relieve the seriousness of so much tragedy. Each competitor, therefore, produced in these cases not one play, but a series of four, and several competitors followed one another throughout the day. Wearisome, dry, unimpassioned, all this may seem to us; but we must remember that to the Greek it meant religious service, literary culture, and the celebration of the national greatness. As he sat in the theatre, the gods of his country looked down approvingly from the Acropolis above, and his fellow-citizens, whom he loved with intense patriotism, were all about him. He might say of the assembly, what an old poet had said of the Ionians gathered for festival at Delos, that you would think them blessed with endless youth, so glorious they were and so blooming; and as the rocks under which he sat re-echoed to the applause of that great assembly, he must indeed have felt the thrill of sympathetic enthusiasm which Plato describes as produced by such occasions.

      The description of the trilogy in plays is alive even today in all forms of entertainment and I like how we can compare similarities to ages ago. Shakespear as an example is a good point of view. Instead of one play it was made into four while following up with competitors through the day. Greeks it was a religious service, literary culture and celebration of national greatness.

    2. All these facts—that the theatre was national, and religious, and rarely open—combined to make the audience on each occasion very numerous. It was a point of national pride, of religious duty, and of common prudence on the part of every citizen, not to miss the two great dramatic festivals of the year when their season came. Accordingly, we hear that thirty thousand people used to be present together; and we may infer from this, as well as from other indisputable evidence, the vast size of the theatre itself. The performance took place in the day-time, and lasted nearly all day, for several plays were presented in succession; and the theatre was open to the sky and to the fields, so that when a man looked away from the solemn half-mysterious representation of the legendary glories of his country, his eye would fall on the city itself, with its temples and its harbours, or on the rocky cliffs of Salamis and the sunny islands of the Ægean. Finally, the performance was musical, and so more like an opera than an ordinary play, though we shall see that even this resemblance is little more than superficial.

      Impressed on how the theatre became a national, religious outbreak of success while rarely open-combined. Like how it was prideful thing to do with a religious duty and yet a common prudence to show that no matter how big or small it was in a persons life at that time that it was no matter what apart of practically all their lives. Visualizing the size of the viewers to thirty thousand just to perform for them allows me to imagine the size of the theatre and the influence of power the theatre had at that time. Also to think people performed all day long just also adds to my view on the commitment these people in time took for this to be a big deal and a successful part of their lives and to others every day.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      We would like to thank Reviewer 1 for recognising the importance of our findings on the heterogeneity in bacterial responses to tachyplesin.

      (1) A double deletion of acrA and tolC (two out of the three components of the major constitutive RND efflux pump) reduces the appearance of the low accumulator phenotype, but interestingly, the single deletions have no effect, and a well-characterised inhibitor of RND efflux pumps also has no effect. The authors identify a two-component system, qseCB, that appears necessary for the appearance of low accumulators, but this system has pleiotropic effects on many cellular systems, with only tenuous connections to efflux. The selected pharmacological agents that could prevent the appearance of low accumulators do not offer clear insight into the mechanism by which low accumulators arise, because they have diverse modes of action.

      We have added that “QseBC, was previously inferred to mediate resistance to a tachyplesin analogue by upregulating efflux genes based on transcriptomic analysis and hyper susceptibility of ΔqseBΔqseC mutants[113]”. However, we have also acknowledged that “it is conceivable that the deletion of QseBC has pleiotropic effects on other cellular mechanisms involved in tachyplesin accumulation.” and that “it is also conceivable that sertraline prevented the formation of the low accumulator phenotype via efflux independent mechanisms”

      These amendments are reported on lines 525-527, 532-534 and 539-541 of our revised manuscript.

      (2) The transcriptomics data collected for low and high accumulator sub-populations are interesting, but in my opinion, the conclusions that can be drawn from these data remain overstated. It is not possible to make any claims about the total amount of "protein synthesis, energy production, and gene expression" on the basis of RNA-Seq data. The reads from each sample are normalised, so there is no information about the total amount of transcript. Many elements of total cellular activity are post-transcriptionally regulated, so it is impossible to assess from transcriptomics alone. Finally, the transcriptomic data are analysed in aggregated clusters of genes that are enriched for biological processes, for example: "Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators". However, this obscures the fact that these clusters include genes that are generally inhibitory of the process named, as well as genes that facilitate the process.

      We have now acknowledged that “that our data do not take into account post-transcriptional modifications that represent a second control point to survive external stressors.”

      These amendments are reported on lines 534-535 of our revised manuscript.

      The raw transcript counts can be found in Figure 3 – Source Data, we had added these data in our previous manuscript as requested by this reviewer.

      We would also like to clarify that we have analysed our transcriptomic data via both clustering (i.e. Figure 3) and direct comparison of genes of interest (Table S1) and transcription factors (i.e. genes that are generally inhibitory of the process named, as well as genes that facilitate the process, Figure S12).

      Finally, we would like to point out that in our revised manuscript (both this and its previous version) we are stating “Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators”. We do not think this is an overstatement, we do not use these data to make conclusions on the total amount of "protein synthesis, energy production, and gene expression".

      (3) The authors have added an experiment to attempt to assess overall metabolic activity in the low accumulator and high accumulator populations, which is a welcome addition. They apply the redox dye resazurin and observe lower resorufin (reduced form) fluorescence in the low accumulator population, which they take to indicate a lower respiration rate. This seems possible, however, an important caveat is that they have shown the low accumulator population to retain substantially lower amounts of multiple different fluorescent molecules (tachyplesin-NBD, propidium iodide, ethidium bromide) intracellularly compared to the high accumulator population. It seems possible that the low accumulator population is also capable of removing resazurin or resorufin from the intracellular space, regardless of metabolic rate. Indeed, it has previously been shown that efflux by RND efflux pumps influences resazurin reduction to resorufin in both P. aeruginosa and E. coli. By measuring only the retained redox dye using flow cytometry, the results may be confounded by the demonstrated ability of the low accumulator population to remove various fluorescent dyes. More work is needed to strongly support broad conclusions about the physiological states of the low and high accumulator populations. The phenomenon of the emergence of low accumulators, which are phenotypically tolerant to the antimicrobial peptide tachyplesin, is interesting and important even if there is still work to be done to understand the mechanism by which it occurs.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

      Reviewer 2:

      We would like to thank the reviewer for recognising that all their previous comments have now been satisfactorily addressed.

      (1) Some mechanistic questions regarding tachyplesin-accumulation and survival remain. One general shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´ cells. As the authors state themselves, this makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern of if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we had explicitly acknowledged this possibility on lines 281-285 (of the previous and current version of this manuscript).

      (2) The statement ´ Moreover, we found that the fluorescence of low accumulators decreased over time when bacteria were treated with 20 μg mL´ is, in my opinion, not supported by the data shown in Figure S4C. That figure shows that the abundance of ´low accumulator´ cells decreases over time. Following the rationale that protease K treatment may cleave surface associated/ extracellular tachyplesin-NDB, this should lead to a shift of ´low accumulator´ population to the left, indicating reduced fluorescence intensity per cell. This is not so case, but the population just disappears. However, after 120 min of treatment more cells appear in the ´high accumulator´ state. This result is somewhat puzzling.

      We agree with the reviewer that our previous discussion of this data could have been misleading. We have now reworded this part of the text as following: “We found that the fluorescence of high accumulators did not decrease over time when tachyplesin-NBD was removed from the extracellular environment and bacteria were treated with 20 μg mL<sup>-1</sup> (0.7 μM) proteinase K, a widely-occurring serine protease that can cleave the peptide bonds of AMPs [43–45] (Figure S4B and C). These data suggest that tachyplesin-NBD primarily accumulates intracellularly in high accumulators.”

      It is conceivable that extended exposure to proteinase K (i.e. we see a decrease in the abundance of low accumulators after 90 min treatment with proteinase K) increased the permeability to tachyplesin-NBD of low accumulators allowing tachyplesin-NBD to move from either the extracellular space or the membrane to the cell interior. However, we do not have data to prove this point.

      Therefore, we have now removed our claim that the data obtained using proteinase K suggest that tachyplesin-NBD accumulates primarily in the membranes of low accumulators. We believe that our two separate microscopy analyses provide more direct, stronger and less ambiguous evidence that tachyplesin-NBD accumulates primarily in the membranes of low accumulators.

      (3) The authors used the metabolic dye resazurin to measure the metabolic activity of low vs. high accumulators. I am not entirely convinced that the lower fluorescence resorufin fluorescence in tachyplesin-NBD accumulators really indicates lower metabolic activity, since a cell's fluorescence levels would also be affected by the cellular uptake and efflux. It appears plausible that the lower resorufin-fluorescence may result from reduced accumulation/increased efflux in the ‘low-tachyplesin NBD´ population.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      (4) P8 line 343. The text should refer to Figure. 13B, instead of 14B

      We have now changed the text accordingly on line 337.

      Reviewer 3:

      We would like to thank the reviewer for recognising that we have done a very impressive job in taking care of their comments.

      (1) Despite these advances, the contribution of efflux may require more direct evidence to further dissect whether efflux is necessary, sufficient, or contributory. The facts that the key low efflux mutant still retains a small fraction of survivors and that the inhibitors used may cause other physiological changes leading to higher efflux are still unaccounted for. The lipidomic and vesicle findings, while intriguing, remain descriptive, and direct tests of their functional relevance would further solidify the mechanistic models.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      Thank you for your feedback. We understand your concern regarding the central claim that tolerance facilitates the evolution of resistance, while the paper can stand on its own without this claim, we think it provides an important layer to the interpretation of our findings. Considering your comments, we plan to revise the title and adjust to “Heat Stress Induces Phage Tolerance in Bacteria”.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.)

      Thank you for your helpful comment. Figure 2d presents colony counts from a plating assay following the phage killing experiment in Figure 2c. Bacteria collected after 0 and 2 hours of phage exposure were plated on both phage-free (−phage) and phage-containing (+phage) plates. The “−phage” condition reflects total survivors, while the “+phage” condition indicates the resistant subset.

      As seen in Figure 2d (left part), heat-treated bacteria showed markedly higher survival on phage-free plates than untreated cells, which were largely eliminated by phage. However, resistant colony counts on phage-containing plates were similar between two groups (as shown in figure 2d right part), suggesting that heat stress increased survival but did not promote resistance.

      To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates. This adjustment should eliminate any confusion and better reflect the experimental design.

      Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper and make it about heat tolerance rather than the evolution of heat resistance.

      Thank you for your thoughtful comment. We appreciate the opportunity to clarify the interpretation of Figure 2f and the rationale behind the experimental design. We agree that turbidity alone cannot fully distinguish resistance from persistence. However, our earlier experiments (Figures 2d and 2e) demonstrated that heat-treated survivors remained largely susceptible to phage, indicating that heat stress does not directly induce resistance. This led us to hypothesize that heat enhances phage tolerance, which in turn increases the likelihood of resistance emergence during subsequent infection.

      To test this, we used a low initial bacterial population (~10³ CFU per well) to minimize the chance of pre-existing resistance. Bacteria were exposed to phages at MOIs of 1, 10, and 100 and incubated for 24 hours in 100 µL volumes. This setup ensured:

      (1) The low initial population minimizes the presence of pre-existing resistant mutants, ensuring that any phage-resistant bacteria observed arise during the infection process.

      (2) The high MOI (≥ 1) ensures that each bacterial cell has a high probability of infection by at least one phage.

      (3) The small volume (100 µL per well) maximizes the interaction between bacteria and phages, ensuring rapid infection of susceptible bacteria, which leads to clear wells. If resistant mutants arise, they will grow and cause turbidity.

      Thus, the turbidity observed in heat-treated samples reflects de novo emergence and outgrowth of resistant mutants from a tolerant population. This assay supports the idea that heat-induced tolerance increases the probability of resistance evolution, rather than directly causing resistance.

      We have revised the text to better explain this experimental logic and adjust the framing of our conclusions accordingly.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      Thank you for your valuable comment. To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

      Thank you for your thoughtful comment. In our heat treatment evolutionary experiment, we isolated six distinct bacterial clones, of which two are highlighted in the manuscript as representative examples. One clone, BC2G11C1, acquired both heat tolerance and phage resistance, while another clone, BC3G11C2, became heat-tolerant but did not develop resistance to phage infection. This variation highlights the inherent diversity in evolutionary responses when exposed to selective pressures. It demonstrates that not all evolutionary pathways lead to the same outcome, even under similar stress conditions. This variability is a key observation in our study, illustrating that different genetic adaptations may arise depending on the specific mutations or genetic context, and not every strain will evolve phage resistance in parallel with heat tolerance. We have updated the manuscript to better reflect this diversity in the evolutionary trajectories observed.

      Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

      Thank you for your insightful comments. I appreciate your suggestion regarding the inclusion of relevant previous works. I have now incorporated additional citations to discuss these points, including studies on the relationship between heat stress and antibiotic resistance, as well as the tradeoffs between phage resistance and other stress factors.

      Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      Thank you for your insightful comments and for bringing up the role of PspA in quorum sensing and biofilm formation. However, we would like to clarify a potential misunderstanding: the PspA discussed in our manuscript refers to phage-shock protein A, a key regulator in the bacterial envelope stress response system. This is distinct from the pneumococcal surface protein A, which has been associated with quorum sensing and biofilm formation in Streptococcus pneumoniae (as referenced in your comment).

      To avoid any confusion for readers, we will ensure that our manuscript explicitly states “phage-shock protein A (PspA)” at its first mention. We appreciate your feedback and hope this clarification addresses your concern.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

      Thank you for your helpful suggestion. We have now included a figure, as appropriate summarizing the proteins of the lytic phage Kp11 (GenBank: ON148528.1) in supplementary Figure S1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Issues unrelated to those discussed in the public review

      (1) Figure 4a and its caption describe an evolution experiment, but they do not mention how many cycles of high-temperature treatment and growth this experiment lasted. I assume it lasted for more than one cycle, because the methods section mentions "cycles", but the number is not provided.

      Thank you for pointing this out. The evolutionary experiment shown in Figure 5a involved 11 cycles of high-temperature treatment and growth. We have now explicitly stated this in the figure legend to ensure clarity: BC: Batch culture, G: Evolution cycle number, C: Colony. BC2G11C1 refers to the first colony from batvh culture 2 after 11 rounds of heat treatment.

      (2) It is not clear what Figure 5F is supposed to show. What are the gray boxes? The caption claims that the figure shows non-synonymous mutations, but the only information it contains is about genes that seem to be affected by mutation. Judging from the mismatch between the main text and the figure, the mutants with these mutations may actually be mislabeled.

      Thank you for your careful review. Figure 5f highlights the non-synonymous mutations identified in the evolved strains. The gray boxes represent the ancestral strain’s whole genome without mutations, serving as a control. The corresponding labels indicate the specific mutations found in each evolved strain. We have clarified this in the figure caption to improve clarity. Additionally, we have carefully reviewed the labeling to ensure accuracy and consistency between the figure, main text, and sequencing data.

      (3) I think that the acronym NC, which is used in just about every figure, is explained nowhere in the paper. Spell out all acronyms at first use.

      Thank you for pointing this out. We have rivewed ensure that NC is clearly defined at its first mention in the text and figure legends to improve clarity. Additionally, we have reviewed the manuscript to ensure that all acronyms are properly introduced when first used.

      (4) The same holds for the acronym N.D. This is an especially important oversight because N.D. could mean "not determined" or "not detectable", which would lead to very different interpretations of the same figure.

      Thank you for your careful review. We have clarified the meaning of N.D., which stands for non-detectable, at its first use to avoid ambiguity and ensure accurate interpretation in the figure legend. Additionally, we have reviewed the manuscript to ensure that all acronyms are clearly defined.

      (5) The panel labels (a,b, etc.) in all figure captions are very difficult to distinguish from the rest of the text, and should be better highlighted, for example by using a bold font. However, this is a matter of journal style and will probably be fixed during typesetting.

      Thank you for your suggestion. We have adjusted the figure captions to better distinguish panel labels, such as using bold font, to improve readability and final formatting will follow the journal’s style during typesetting.

      (6) Line 224: enhanced insusceptibility -> reduced susceptibility.

      Thank you for your suggestion. We have revised “enhanced insusceptibility” to “reduced susceptibility” for clarity and precision.

      (7) Line 259: mice -> mouse.

      Thank you for catching this. We have corrected “mice” to “mouse”.

      Reviewer #2 (Recommendations for the authors):

      I have no concerns about the experimental design and conclusions of your work; however, I strongly recommend incorporating several relevant pieces of the literature related to your work, in the discussion of your manuscript, specifically:

      (1) Previous studies about the role of heat stress in phage infections, see:

      Greenrod STE, Cazares D, Johnson S, Hector TE, Stevens EJ, MacLean RC, King KC. Warming alters life-history traits and competition in a phage community. Appl Environ Microbiol. 2024 May 21;90(5):e0028624. doi: 10.1128/aem.00286-24. Epub 2024 Apr 16. PMID: 38624196; PMCID: PMC11107170.

      Thank you for your thoughtful comment. We have ensured to incorporate the study by Greenrod et al. (2024) into the discussion to enrich the context of our findings. As this article pointed out, a temperature of 42°C can indeed limit phage infection in bacteria, acting as a barrier from the phage’s perspective. Our study builds on this by demonstrating that bacteria pre-treated with high temperatures exhibit tolerance to phage infection. These findings, together with the work you referenced, underscore the importance of heat stress or elevated temperature in host-phage interactions, with 42°C being particularly relevant in the context of fever. We will make sure to clarify this connection in our revised manuscript.

      (2) The effect of heat stress and the tolerance/resistance against other antibiotics besides polymyxins, see:

      Lv B, Huang X, Lijia C, Ma Y, Bian M, Li Z, Duan J, Zhou F, Yang B, Qie X, Song Y, Wood TK, Fu X. Heat shock potentiates aminoglycosides against gram-negative bacteria by enhancing antibiotic uptake, protein aggregation, and ROS. Proc Natl Acad Sci U S A. 2023 Mar 21;120(12):e2217254120. doi: 10.1073/pnas.2217254120. Epub 2023 Mar 14. PMID: 36917671; PMCID: PMC10041086.

      Thank you for bringing this study to our attention. We have incorporated the findings from Lv et al. (2023) into the discussion of our manuscript, highlighting how sublethal temperatures may facilitate the killing of bacteria by antibiotics like kanamycin. This is consistent with our data showing enhanced susceptibility of heat-shocked bacteria to kanamycin. The study also provides insights into the potential role of PMF, which is relevant to our work on PspA, and strengthens the broader context of heat stress influencing both antibiotic resistance and tolerance.

      (3) Perhaps the most relevant overlooked fact was that recently it was demonstrated for E. coli, Klebsiella and Pseudomonas that pretreatment with lytic phages induced antibiotic persistence! Please discuss this finding and its implications for your work, see:

      Fernández-García L, Kirigo J, Huelgas-Méndez D, Benedik MJ, Tomás M, García-Contreras R, Wood TK. Phages produce persisters. Microb Biotechnol. 2024 Aug;17(8):e14543. doi: 10.1111/1751-7915.14543. PMID: 39096350; PMCID: PMC11297538.

      Sanchez-Torres V, Kirigo J, Wood TK. Implications of lytic phage infections inducing persistence. Curr Opin Microbiol. 2024 Jun;79:102482. doi: 10.1016/j.mib.2024.102482. Epub 2024 May 6. PMID: 38714140.

      Thank you for suggesting this important reference. We agree that the phenomenon of phage-induced bacterial persistence is highly relevant to our study. While our manuscript focuses on the role of heat stress in bacterial tolerance and resistance, we acknowledge that bacterial persistence against phages is an established concept. We have incorporated this finding into our discussion, emphasizing how persistence and tolerance can overlap in their effects on bacterial survival, especially under stress conditions like heat treatment. This will provide a more comprehensive understanding of how phage interactions with bacteria can lead to both persistence and resistance.

      (4) Finally, you observed a tradeoff pf the pspA* mutant increased phage/heat/polymyxin resistance and decreased immune evasion (perhaps by being unable to counteract phagocytosis), those tradeoffs between gaining phage resistance but losing resistance to the immune system, virulence impairment and resistance against some antibiotics had been extensively documented, see:

      Majkowska-Skrobek G, Markwitz P, Sosnowska E, Lood C, Lavigne R, Drulis-Kawa Z. The evolutionary trade-offs in phage-resistant Klebsiella pneumoniae entail cross-phage sensitization and loss of multidrug resistance. Environ Microbiol. 2021 Dec;23(12):7723-7740. doi: 10.1111/1462-2920.15476. Epub 2021 Mar 27. PMID: 33754440.

      Gordillo Altamirano F, Forsyth JH, Patwa R, Kostoulias X, Trim M, Subedi D, Archer SK, Morris FC, Oliveira C, Kielty L, Korneev D, O'Bryan MK, Lithgow TJ, Peleg AY, Barr JJ. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat Microbiol. 2021 Feb;6(2):157-161. doi: 10.1038/s41564-020-00830-7. Epub 2021 Jan 11. PMID: 33432151.

      García-Cruz JC, Rebollar-Juarez X, Limones-Martinez A, Santos-Lopez CS, Toya S, Maeda T, Ceapă CD, Blasco L, Tomás M, Díaz-Velásquez CE, Vaca-Paniagua F, Díaz-Guerrero M, Cazares D, Cazares A, Hernández-Durán M, López-Jácome LE, Franco-Cendejas R, Husain FM, Khan A, Arshad M, Morales-Espinosa R, Fernández-Presas AM, Cadet F, Wood TK, García-Contreras R. Resistance against two lytic phage variants attenuates virulence and antibiotic resistance in Pseudomonas aeruginosa. Front Cell Infect Microbiol. 2024 Jan 17;13:1280265. doi: 10.3389/fcimb.2023.1280265. Erratum in: Front Cell Infect Microbiol. 2024 Mar 06;14:1391783. doi: 10.3389/fcimb.2024.1391783. PMID: 38298921; PMCID: PMC10828002.

      Thank you for highlighting these important studies. We have incorporated the work by Majkowska-Skrobek et al. (2021), Gordillo Altamirano et al. (2021), and García-Cruz et al. (2024) into the discussion to provide further context to the evolutionary trade-offs observed in our study. The findings in these studies, which describe the cross-sensitization to antimicrobials and the loss of multidrug resistance in phage-resistant bacteria, align with our observations of trade-offs in the pspA mutant. Specifically, our results show that while the pspA mutant exhibits increased resistance to phage, heat, and polymyxins, it also experiences a decrease in immune evasion and potential virulence. These trade-offs are significant in understanding the broader consequences of developing resistance to phages and other stressors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1.

      Comments on revisions:

      I am satisfied with the revisions made by the authors, which helped clarify some points that were confusing in the initial submission.

      Thank you.

      Reviewer #2 (Public Review):

      This revised manuscript mostly addresses previous concerns by doubling down on the model without providing additional direct evidence of interactions between Srs2 and PCNA, and that "precise sites of Srs2 actions in the genome remain to be determined." One additional Srs2 allele has been examined, showing some effect in combination with rfa1-zm2. Many of the conclusions are based on reasonable assumptions about the consequences of various mutations, but direct evidence of changes in Srs2 association with PNCA or other interactors is still missing. There is an assumption that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects, which may not be the case. How SLX4 might interact with Srs2 is unclear to me, again assuming that the SLX4 defect is "surgical" - removing only one of its many interactions.

      Previous studies have already provided direct evidence for the interaction between Srs2 and PCNA through the Srs2’s PIM region (Armstrong et al, 2012; Papouli et al, 2005); we have added these citations in the text. Similarly. Srs2 associations with SUMO and Rad51 have also been demonstrated (Colavito et al, 2009; Kolesar et al, 2016; Kolesar et al., 2012), and these studies were cited in the text.

      We did not state that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects. We only assessed whether these previously characterized mutant alleles could mimic srs2∆ in rescuing rfa1-zm2 defects.

      We assessed the genetic interaction between slx4-RIM and srs2-∆PIM mutants, and not the physical interaction between the two proteins. As we described in the text, our rationale for this genetic test is based on that the reports that both slx4 and srs2 mutants impair recovery from the Mec1 induced checkpoint, thus they may affect parallel pathways of checkpoint dampening.

      One point of concern is the use of t-tests without some sort of correction for multiple comparisons - in several figures. I'm quite sceptical about some of the p < 0.05 calls surviving a Bonferroni correction. Also in 4B, which comparison is **? Also, admittedly by eye, the changes in "active" Rad53 seem much greater than 5x. (also in Fig. 3, normalizing to a non-WT sample seems odd).

      Claims made in this work were based only on pairwise comparison not multi-comparison. We have now made this point clearer in the graphs and in Method. As the values were compared between a wild-type strain and a specific mutant strain, or between two mutants, we believe that t-test is suitable for statistical analysis.

      Figure 4B, ** indicates that the WT value is significantly different from that of the slx4-RIM srs2-∆PIM double mutant and from that of srs2-∆PIM single mutant. We have modified the graph to indicate the pair-wide comparison. The 5-fold change of active Rad53 levels was derived by comparing the values between the srs2∆ PIM slx4<sup>RIM</sup>-TAP double mutant and wild-type Slx4-TAP. In Figure 3, normalization to the lowest value affords better visualization. This is rather a stylish issue; we would like to maintain it as the other reviewers had no issues.

      What is the WT doubling time for this strain? From the FACS it seems as if in 2 h the cells have completed more than 1 complete cell cycle. Also in 5D. Seems fast...

      Wild-type W303 strain has less than 90 min doubling time as shown by many labs, and our data are consistent with this. The FACS profiles for wild-type cells shown in Figures 3C, 4C, and 5C are consistent with each other, showing that after G1 cells entered the cell cycle, they were in G2 phase at the 1-hour time points, and then a percentage of the cells exited the first cell cycle by two hours.

      I have one over-arching confusion. Srs2 was shown initially to remove Rad51 from ssDNA and the suppression of some of srs2's defects by deleting rad51 made a nice, compact story, though exactly how srs2's "suppression of rad6" fit in isn't so clear (since Rad6 ties into Rad18 and into PCNA ubiquitylation and into PCNA SUMOylation). Now Srs2 is invoked to remove RPA. It seems to me that any model needs to explain how Srs2 can be doing both. I assume that if RPA and Rad51 are both removed from the same ssDNA, the ssDNA will be "trashed" as suggested by Symington's RPA depletion experiments. So building a model that accounts for selective Srs2 action at only some ssDNA regions might be enhanced by also explaining how Rad51 fits into this scheme.

      While the anti-recombinase function of Srs2 was better studied, its “anti-RPA” role in checkpoint dampening was recently described by us (Dhingra et al, 2021) following the initial report by the Haber group some time ago (Vaze et al, 2002). A better understanding of this new role is required before we can generate a comprehensive picture of how Srs2 integrates the two functions (and possibly other functions). Our current work addresses this issue by providing a more detailed understanding of this new role of Srs2.

      Single molecular data showed that Srs2 strips both RPA and Rad51 from ssDNA, but this effect is highly dynamic (i.e. RPA and Rad51 can rebind ssDNA after being displaced) (De Tullio et al, 2017). As such, generation of “deserted” ssDNA regions lacking RPA and Rad51 in cells can be an unlikely event. Rather, Srs2 can foster RPA and Rad51 dynamics on ssDNA. Additional studies will be needed to generate a model that integrates the anti-recombinase and the anti-RPA roles of Srs2.

      As a previous reviewer has pointed out, CPT creates multiple forms of damage. Foiani showed that 4NQO would activate the Mec1/Rad53 checkpoint in G1- arrested cells, presumably because there would be singlestrand gaps but no DSBs. Whether this would be a way to look specifically at one type of damage is worth considering; but UV might be a simpler way to look. As also noted, the effects on the checkpoint and on viability are quite modest. Because it isn't clear (at least to me) why rfa1 mutants are so sensitive to CPT, it's hard for me to understand how srs2-zm2 has a modest suppressive effect: is it by changing the checkpoint response or facilitating repair or both? Or how srs2-3KR or srs2-dPIM differ from rfa1-zm2 in this respect. The authors seem to lump all these small suppressions under the rubric of "proper levels of RPA-ssDNA" but there are no assays that directly get at this. This is the biggest limitation.

      CPT treatment is an ideal condition to examine how cells dampen the DNA damage checkpoint, because while most genotoxic conditions (e.g. 4NQO, MMS) induce both the DNA replication checkpoint and the DNA damage checkpoint, CPT was shown to only induced the latter (Menin et al, 2018; Minca & Kowalski, 2011; Redon et al, 2003; Tercero et al, 2003). Future studies examining 4NQO and UV conditions can further expand our understanding of checkpoint dampening in different conditions.

      We have previously provided evidence to support the conclusion that srs2 suppression of rfa1-zm is partly mediated by changing checkpoint levels (Dhingra et al., 2021). We cannot exclude the possibility that the suppression may also be related to changes of DNA repair; we have now added this note in the text.

      Regarding direct testing RPA levels on DNA, we have previously shown that srs2∆ increased the levels of chromatin associated Rfa1 and this is suppressed by rfa1-zm2 (Dhingra et al., 2021). We have now included chromatin fractionation data to show that srs2-∆PIM also led to an increase of Rfa1 on chromatin, and this was suppressed by rfa1-zm2 (new Fig. S2).

      Srs2 has also been implicated as a helicase in dissolving "toxic joint molecules" (Elango et al. 2017). Whether this activity is changed by any of the mutants (or by mutations in Rfa1) is unclear. In their paper, Elango writes: "Rare survivors in the absence of Srs2 rely on structure-specific endonucleases, Mus81 and Yen1, that resolve toxic joint-molecules" Given the involvement of SLX4, perhaps the authors should examine the roles of structure-specific nucleases in CPT survival?

      Srs2 has several roles, and its role in RPA antagonism can be genetically separated from its role in Rad51 regulation as we have shown in our previous work (Dhingra et al., 2021) and this notion is further supported by evidence presented in the current work. Srs2’s role in dissolving "toxic joint molecules” was mainly observed during BIR (Elango et al, 2017). Whether it is related to checkpoint dampening will be interesting to address in the future but is beyond of the scope of the current work that seeks to answer the question how Srs2 regulates RPA during checkpoint dampening. Similarly, determining the roles of Mus81 and Yen1 and other structural nucleases in CPT survival is a worthwhile task but it is a research topic well separated from the focus of this work.

      Experiments that might clarify some of these ambiguities are proposed to be done in the future. For now, we have a number of very interesting interactions that may be understood in terms of a model that supposes discriminating among gaps and ssDNA extensions by the presence of PCNA, perhaps modified by SUMO. As noted above, it would be useful to think about the relation to Rad6.

      Several studies have shown that Srs2’s functional interaction with Rad6 is based on Srs2-mediated recombination regulation (reviewed by (Niu & Klein, 2017). Given that recombinational regulation by Srs2 is genetically separable from the Srs2 and RPA antagonism (Dhingra et al., 2021), we do not see a strong rationale to examine Rad6 in this work, which addresses how Srs2 regulates RPA. With this said, this study has provided basis for future studies of possible cross-talks among different Srs2-mediated pathways.

      Reviewer #3 (Public Review):

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2.

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. Double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of this mutants in a RFA1 wild type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. This data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction, did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths:

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights.

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data.

      Weaknesses:

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2-Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613).

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results.

      Comments on revisions:

      In this revision, the authors adequately addressed my concerns. The only issue I see remaining is the site of Srs2 action. The authors argue in favor of gaps and against R-loops and ssDNA resulting from excessive supercoiling. The authors do not discuss ssDNA resulting from processing of onesided DSBs, which are expected to result from replication run-off after CPT damage but are not expected to provide the 3'-junction for preferred PCNA loading. Can the authors exclude PCNA at the 5'-junction at a resected DSB?

      We have now added a sentence stating that we cannot exclude the possibility that PCNA may be positioned at a 5’-junction, as this can be observed in vitro, albert that PCNA loading was seen exclusively at a 3’-junction in the presence of RPA (Ellison & Stillman, 2003; Majka et al, 2006).

      Recommendations For the authors:

      Reviewer #2 (Recommendations For the authors):

      A Bonferroni correction should be made for the multiple comparisons in several figures.

      Specific comments:

      l. 41. This is a too long and confusing sentence.

      Sentence shortened: “These data suggest that Srs2 recruitment to PCNA proximal ssDNA-RPA filaments followed by its sumoylation can promote checkpoint recovery, whereas Srs2 action is minimized at regions with no proximal PCNA to permit RPA-mediated ssDNA protection”.

      l. 60. Identify Ddc2 and Mec1 as ATRIP and ATR.

      Done.

      l. 125 "fails to downregulate RPA levels on chromatin and Mec1-mediated DDC..." fails to downregulate RPA and fails to reduce Mec1-mediated DDC?

      Sentence modified: “fails to downregulate both the RPA levels on chromatin and the Mec1-mediated DDC”

      l. 204 "consistent with the notion that Srs2 has roles beyond RPA regulation"... What other roles? It's stripping of Rad51? Removing toxic joint molecules? Something else?

      Sentence modified: “consistent with the notion that Srs2 has roles beyond RPA regulation, such as in Rad51 regulation and removing DNA joint molecules”.

      l. 249 "Significantly, srs2-ΔPIM and -3KR increased the percentage of rfa1-zm2 cells transitioning into the G1 phase" No. Just back to normal. As stated in l. 258: "258 We found that srs2-ΔPIM and srs2-3KR mutants on their own behaved normally in the two DDC assays described above." All of these effects are quite small.

      Sentence modified: “Compared with rfa1-zm2 cells, srs2-∆PIM rfa1-zm2 and srs2-3KR rfa1-zm2 cells showed increased percentages of cells transitioning into the G1 phase”.

      l. 468 "Our previous work has provided several lines of evidence to support that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). What evidence? See my comment above about not having both proteins removed at the same time.

      We have addressed this point in our initial rebuttal and some key points are summarized below. In our previous report (Dhingra et al., 2021), we provided several lines of evidence to support the conclusion that Rad51 is not relevant to the Srs2-RPA antagonism. For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, rad51∆ did not affect the hyper-checkpoint phenotype of srs2∆. In contrast, rfa1-zm1/zm2 have the opposite effects, that is, rfa1zm1/zm2 suppressed the hyper-checkpoint, but not the hyper-recombination, phenotype of srs2∆ cells. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the ATPase dead allele of Srs2 (srs2K41A). For example, rfa1-zm2 rescued hyper-checkpoint and CPT sensitivity of srs2-K41A cells, while rad51∆ had neither effect. These and other data described by Dhingra et al (2021) suggest that Srs2’s effects on checkpoint vs. recombination can be separated genetically. Consistent with our conclusion summarized above, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Fig. 2D). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2RPA antagonism.

      l. 525 "possibility, we tested the separation pin of Srs2 (Y775), which was shown to enables its in vitro helicase activity during the revision of our work..." ?? there was helicase activity during the revision of your work? Please fix the sentence.

      Sentence modified: “we tested the separation pin of Srs2 (Y775). This residue was shown to be key for the Srs2’s helicase activity in vitro in a report that was published during the revision of our work (Meir et al, 2023).”

      Fig. 3. "srs2-ΔPIM and -3KR allow better G1 entry of rfa1-zm2 cells." is it better entry or less arrest at G2/M? One implies better turning off of a checkpoint, the other suggests less activation of the checkpoint.

      This is a correct statement. For all strains examined in Figure 3, cells were seen in G2/M phase after 1-hour CPT treatment, suggesting proper arrest.

      References:

      Armstrong AA, Mohideen F, Lima CD (2012) Recognition of SUMO-modified PCNA requires tandem receptor motifs in Srs2. Nature 483: 59-63

      Colavito S, Macris-Kiss M, Seong C, Gleeson O, Greene EC, Klein HL, Krejci L, Sung P (2009) Functional significance of the Rad51-Srs2 complex in Rad51 presynaptic filament disruption. Nucleic Acids Res 37: 6754-6764.

      De Tullio L, Kaniecki K, Kwon Y, Crickard JB, Sung P, Greene EC (2017) Yeast Srs2 helicase promotes redistribution of single-stranded DNA-bound RPA and Rad52 in homologous recombination regulation. Cell Rep 21: 570-577

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E, Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin. Proc Natl Acad Sci U S A 118: e2020185118

      Elango R, Sheng Z, Jackson J, DeCata J, Ibrahim Y, Pham NT, Liang DH, Sakofsky CJ, Vindigni A, Lobachev KS et al (2017) Break-induced replication promotes formation of lethal joint molecules dissolved by Srs2. Nat Commun 8: 1790

      Ellison V, Stillman B (2003) Biochemical characterization of DNA damage checkpoint complexes: clamp loader and clamp complexes with specificity for 5' recessed DNA. PLoS Biol 1: E33

      Kolesar P, Altmannova V, Silva S, Lisby M, Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction. J Biol Chem 291: 7594-7607.

      Kolesar P, Sarangi P, Altmannova V, Zhao X, Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation. Nucleic Acids Res 40: 7831-7843.

      Majka J, Binz SK, Wold MS, Burgers PM (2006) Replication protein A directs loading of the DNA damage checkpoint clamp to 5'-DNA junctions. J Biol Chem 281: 27855-27861

      Meir A, Raina VB, Rivera CE, Marie L, Symington LS, Greene EC (2023) The separation pin distinguishes the pro- and anti-recombinogenic functions of Saccharomyces cerevisiae Srs2. Nat Commun 14: 8144

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP, Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after Topoisomerase poisoning. EMBO Rep 19: e45535

      Minca EC, Kowalski D (2011) Replication fork stalling by bulky DNA damage: localization at active origins and checkpoint modulation. Nucleic Acids Res 39: 2610-2623

      Niu H, Klein HL (2017) Multifunctional roles of Saccharomyces cerevisiae Srs2 protein in replication, recombination and repair. FEMS Yeast Res 17: fow111

      Papouli E, Chen S, Davies AA, Huttner D, Krejci L, Sung P, Ulrich HD (2005) Crosstalk between SUMO and ubiquitin on PCNA is mediated by recruitment of the helicase Srs2p. Mol Cell 19: 123-133

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF, Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage. EMBO Rep 4: 678-684

      Tercero JA, Longhese MP, Diffley JFX (2003) A central role for DNA replication forks in checkpoint activation and response. Mol Cell 11: 1323-1336

      Vaze MB, Pellicioli A, Lee SE, Ira G, Liberi G, Arbel-Eden A, Foiani M, Haber JE (2002) Recovery from checkpointmediated arrest after repair of a double-strand break requires Srs2 helicase. Mol Cell 10: 373-385

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I In this manuscript, Jiao D et al reported the induction of synthetic lethal by combined inhibition of anti-apoptotic BCL-2 family proteins and WSB2, a substrate receptor in CRL5 ubiquitin ligase complex. Mechanistically, WSB2 interacts with NOXA to promote its ubiquitylation and degradation. Cancer cells deficient in WSB2, as well as heart and liver tissues from Wsb2-/- mice exhibit high susceptibility to apoptosis induced by inhibitors of BCL-2 family proteins. The anti-apoptotic activity of WSB2 is partially dependent on NOXA.

      Overall, the finding, that WSB2 disruption triggers synthetic lethality to BCL-2 family protein inhibitors by destabilizing NOXA, is rather novel. The manuscript is largely hypothesis-driven, with experiments that are adequately designed and executed. However, there are quite a few issues for the authors to address, including those listed below.

      Specific comments:

      (1) At the beginning of the Results section, a clear statement is needed as to why the authors are interested in WSB2 and what brought them to analyze "the genetic co-dependency between WSB2 and other proteins".

      We thank the reviewer for raising this important point. We agree that a clear rationale should be provided at the beginning of the Results section. As reported in previous studies [Ref: 1, 2, 3], strong synthetic interactions have been observed between WSB2 and several mitochondrial apoptosis-related factors, including MCL-1, BCL-xL, and MARCH5. We have referenced these findings in the Discussion section. Motivated by these studies, we became interested in the role of WSB2 and aimed to investigate the specific mechanisms underlying its synthetic lethality with anti-apoptotic BCL-2 family members. We will revise the beginning of the Results section to clearly state this rationale.

      (1) McDonald, E.R., 3rd et al. Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell 170, 577-592 e510 (2017).

      (2) DeWeirdt, P.C. et al. Genetic screens in isogenic mammalian cell lines without single cell cloning. Nat Commun 11, 752 (2020).

      (3) DeWeirdt, P.C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat Biotechnol 39, 94-104 (2021).

      (2) In general, the biochemical evidence supporting the role of WSB2 as a SOCS box-containing substrate-binding receptor of CRL5 E3 in promoting NOXA ubiquitylation and degradation is relatively weak. First, since NOXA binds to WSB2 on its SOCS box, which consists of a BC box for Elongin B/C binding and a CUL5 box for CUL5 binding, it is crucial to determine whether the binding of NOXA on the SOCS box affects the formation of CRL5WSB2 complex. The authors should demonstrate the endogenous binding between NOXA and the CRL5WSB2 complex. Additionally, the authors may also consider manipulating CUL5, SAG, or ElonginB/C to assess if it would affect NOXA protein turnover in two independent cell lines.

      We thank the reviewer for raising this important point. To determine whether endogenous NOXA binds to the intact CRL5<sup>WSB2</sup> complex, we performed co-immunoprecipitation assays using an antibody against NOXA. Indeed, NOXA co-immunoprecipitated with all subunits of the CRL5<sup>WSB2</sup> complex (Figure 2—figure supplement 1D), suggesting that NOXA binding to WSB2 does not disrupt interactions between WSB2 and the other CRL5 subunits. Moreover, depletion of CRL5 complex components (RBX2/SAG, CUL5, ELOB, or ELOC) through siRNAs in C4-2B or Huh-7 cells also resulted in a marked increase in NOXA protein levels.

      Second, in all the experiments designed to detect NOXA ubiquitylation in cells, the authors utilized immunoprecipitation (IP) with FLAG-NOXA/NOXA, followed by immunoblotting (IB) with HA-Ub. However, it is possible that the observed poly-Ub bands could be partly attributed to the ubiquitylation of other NOXA binding proteins. Therefore, the authors need to consider performing IP with HA-Ub and subsequently IB with NOXA. Alternatively, they could use Ni-beads to pull down all His-Ub-tagged proteins under denaturing conditions, followed by the detection of FLAG-tagged NOXA using anti-FLAG Ab. The authors are encouraged to perform one of these suggested experiments to exclude the possibility of this concern. Furthermore, an in vitro ubiquitylation assay is crucial to conclusively demonstrate that the polyubiquitylation of NOXA is indeed mediated by the CRL5WSB2 complex.

      We appreciate the reviewer for raising these important considerations regarding our ubiquitylation assays. We fully acknowledge the reviewer's concern that classical ubiquitination assays could potentially detect ubiquitination of proteins interacting with NOXA. However, we would like to clarify that our experimental conditions effectively mitigate this issue. Specifically, cells were lysed using buffer containing 1% SDS followed by boiling at 105°C for 5 minutes. These rigorous denaturing conditions ensure disruption of non-covalent protein interactions, thereby effectively eliminating the possibility of detecting ubiquitination signals from NOXA-associated proteins.

      Regarding the suggestion to perform an in vitro ubiquitination assay, we agree this experiment would indeed provide additional evidence. However, due to significant technical complexities associated with reconstituting CRL5-based E3 ubiquitin ligase activity in vitro—which would require the expression and purification of at least six recombinant proteins—such experiments are rarely performed in this context. Furthermore, NOXA is uniquely localized as a membrane protein on the mitochondrial outer membrane, posing additional significant challenges for protein expression and purification. Given the robustness of our current in vivo ubiquitylation assay under stringent denaturing conditions, we believe our existing data sufficiently and conclusively demonstrate NOXA ubiquitination mediated by the CRL5<sup>WSB2</sup> complex.

      (3) In their attempt to map the binding regions between NOXA and WSB2, the authors utilized exogenous proteins of both WSB2 and NOXA. To strengthen their findings, it would be more convincing to perform IP with exogenous wt/mutant WSB2 or NOXA and subsequently perform IB to detect endogenous NOXA or WSB2, respectively. Additionally, an in vitro binding assay using purified proteins would provide further evidence of a direct binding between NOXA and WSB2.

      We thank the reviewer for raising these important issues. In response to the reviewer’s suggestion to map the binding regions between NOXA and WSB2 more convincingly, we have indeed performed semi-endogenous Co-IP assays, which yielded results consistent with our exogenous protein experiments (Figure 3—figure supplement 1A, B). Concerning the recommendation to further validate direct interaction using purified recombinant proteins, we encountered substantial technical difficulties in obtaining pure and soluble recombinant WSB2 protein. Additionally, given that NOXA is an outer mitochondrial membrane protein and the interaction occurs on mitochondria, we believe that an in vitro binding assay may have limited physiological relevance. We hope the reviewer can appreciate these practical challenges and our current evidence supporting the strong interaction between NOXA and WSB2.

      Reviewer #2 (Public Review):

      Summary:

      Exploring the DEP-MAP database and two drug-screen databases, the authors identify WSB2 as an interactor of several BCL2 proteins. In follow-up experiments, they show that CRL5/WSB2 controls NOXA protein levels via K48 ubiquitination following direct protein-protein interaction, and cell death sensitivity in the context of BH3 mimetic treatment, where WSB2 depletion synergizes with drug treatment.

      Strengths:

      The authors use a set of orthogonal methods across different model cell lines and a new WSB2 KO mouse model to confirm their findings. They also manage to correlate WSB2 expression with poor prognosis in prostate and liver cancer, supporting the idea that targeting WSB2 may sensitize cancers for treatment with BH3 mimetics.

      Weaknesses:

      The conclusions drawn based on the findings in cancer patients are very speculative, as regulation of NOXA cannot be the sole function of CRL5/WSB2 and it is hence unclear what causes correlation with patient survival. Moreover, the authors do not provide a clear mechanistic explanation of how exactly higher levels of NOXA promote apoptosis in the absence of WSB2. This would be important knowledge, as usually high NOXA levels correlate with high MCL1, as they are turned over together, but in situations like this, or loss of other E3 ligases, such as MARCH, the buffering capacity of MCL1 is outrun, allowing excess NOXA to kill (likely by neutralizing other BCL2 proteins it usually does not bind to, such as BCLX). Moreover, a necroptosis-inducing role of NOXA has been postulated. Neither of these options is interrogated here.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2J. The authors showed that "the mRNA levels of NOXA were even reduced in WSB2-KO cells compared to parental cells". What is the possible mechanism? This point should at least be discussed.

      We thank the reviewer for raising these important issues. The underlying mechanisms for the significantly lower mRNA levels of NOXA following the KO of WSB2 are not fully understood at present. However, we propose that this could represent a form of negative feedback regulation at the level of gene expression. Specifically, when the protein levels of BNIP3/3L rise sharply, it may activate mechanisms that suppress their own mRNA synthesis or stability, serving as a buffering system to prevent further protein accumulation. Such negative feedback loops may be critical for maintaining cellular homeostasis and avoiding excessive protein production. Moreover, this phenomenon is frequently observed in other studies investigating substrates targeted by E3 ubiquitin ligases for degradation. We have elaborated on this point in the Discussion section.

      (2) Figure 2M. A previous study has clearly demonstrated that NOXA is subjected to ubiquitylation and degradation by CRL5 E3 ligase (PMID: 27591266). This paper should be cited. Also, in that publication, NOXA ubiquitylation is via the K11 linkage, not the K48 linkage. The authors should include K11R mutant in their assay.

      We thank the reviewer for raising this important issue. We thank the reviewer for suggesting the relevant reference (PMID: 27591266), which we have now cited accordingly. Additionally, we would like to clarify that our new in vivo ubiquitination assays included the K11R and K11-only ubiquitin mutants, and our data demonstrate that WSB2-mediated NOXA ubiquitination indeed involves the K11 linkage ubiquitination(Figure 2—figure supplement 1E).

      (3) Figure 3H, J. The authors stated, "By mutating these lysine residues to arginine, we found that WSB2-mediated NOXA ubiquitination was completely abolished". Which one of the three lysine residues is playing the dominant role?

      We thank the reviewer for raising this important issue. To address this, we generated FLAG-NOXA mutants individually substituting lysine residues K35, K41, and K48 with arginine. In vivo ubiquitination assays demonstrated that lysine 48 (K48) is the predominant residue responsible for WSB2-mediated NOXA ubiquitination (Figure 3—figure supplement 1C).

      (4) Figure 3N. The authors need to show that the fusion peptide containing C-terminal NOXA peptide competitively inhibits the interaction between endogenous WSB2 and NOXA and extends the protein half-life of NOXA, leading to NOXA accumulation.

      We sincerely thank the reviewer for raising these important issues. As suggested, we investigated whether the fusion peptide containing the C-terminal NOXA sequence competitively disrupts the interaction between endogenous WSB2 and NOXA, subsequently influencing NOXA stability. Our results demonstrated that treatment with this fusion peptide indeed significantly reduced the endogenous interaction between WSB2 and NOXA (Figure 3—figure supplement 1D). Furthermore, we observed that the peptide dose-dependently increased endogenous NOXA protein levels and prolonged its protein half-life, thereby resulting in the accumulation of NOXA (Figure 3N; Figure 3—figure supplement 1E, F). These findings collectively indicate that the fusion peptide competitively inhibits the WSB2-NOXA interaction, stabilizes NOXA protein, and enhances its accumulation.

      (5) Figure 4. a) It would be better to investigate whether WSB2 knockdown can sensitize cancer cells to the treatment with ABT-737 or AZD5991, evidenced by a decrease in both IC50 values and clonogenic survival rates and whether such sensitization is dependent on NOXA. b) The authors need to show the levels of cleaved caspase-3/7/9 and the percentages of apoptotic cells in shNC cells upon silencing of WSB2 in Figure 4A-F. c) It will be more convincing to repeat the experiment to show synthetic lethality by WSB2 disruption and MCL-1 inhibitor AZD5991 treatment using another cell line, such as WSB2-deficient Huh-7 cells in Figure 4 I&J.

      We sincerely thank the reviewer for these valuable and constructive suggestions. Regarding point (a): We believe that our current Western blot and flow cytometry data (Figure 4G–L) have already provided strong evidence that WSB2 depletion enhances apoptosis in response to ABT-737 and AZD5991. Therefore, we consider that additional IC50 and clonogenic survival assays, while informative, may not be essential for supporting our conclusion. Furthermore, as shown in Figure 5A–F, we found that silencing NOXA largely, though not completely, reversed the enhanced apoptosis triggered by these inhibitors in WSB2-deficient cells, suggesting that the sensitization effect is at least partially dependent on NOXA.

      Regarding point (b): We have shown that WSB2 knockout alone had no impact on the levels of cleaved caspase-3/7/9 or the percentages of apoptotic cells in Huh-7 and C4-2B cells (Figure 4G-L and Figure 4—figure supplement 1A-D), indicating that WSB2 loss does not induce apoptosis on its own under basal conditions.

      Regarding point (c): We appreciate the reviewer’s suggestion and have now repeated the experiment in WSB2 knockout Huh-7 cells. The new results further support the synthetic lethality between WSB2 loss and AZD5991 treatment (Figure 4—figure supplement 1C, D).

      (6) Figure 5A/C/E. The effect of siNOXA is minor, if any, for cleavage of caspases. The same thing for Figure 6F/H.

      We appreciate the reviewer’s insightful observation regarding the relatively modest effect of shNOXA on caspase cleavage in Figures 5A/C/E and Figures 6F/H. Indeed, we acknowledge that the reduction in caspase cleavage following NOXA knockdown is moderate. However, consistent with our discussions in the manuscript, NOXA knockdown significantly—but not completely—rescued the increased apoptosis observed in WSB2-deficient cells treated with BCL-2 family inhibitors. This suggests that while NOXA plays a notable role, additional mechanisms or unidentified targets may also be involved in WSB2-mediated regulation of apoptosis.

      (7) Figure 5 I&J. The authors may consider performing IHC staining, immunofluorescence, or WB analysis to show the levels of NOXA and cleaved caspases or PARP in xenograft tumors. This would provide in vivo evidence of significant apoptosis induction resulting from the co-administration of ABT-737 and R8-C-terminal NOXA peptide.

      We appreciate the reviewer's thoughtful suggestion regarding additional immunohistochemical or immunofluorescence analyses in xenograft tumors. However, due to current limitations in available antibodies suitable for reliable detection of NOXA by IHC and IF, we are unable to perform these experiments. We greatly appreciate the reviewer's understanding of this technical constraint. Nevertheless, our existing data collectively supports the conclusion that the combination of ABT-737 and R8-C-terminal NOXA peptide significantly enhances apoptosis in vivo.

      (8) Figure 7. Does an inverse correlation exist between the protein levels of WSB2 and NOXA in RPAD or LIHC tissue microarrays? On page 12, in the first paragraph, Figure 7M-P was cited incorrectly.

      We sincerely thank the reviewer for raising this important issue. As mentioned above, due to current limitations regarding the availability of suitable antibodies that can reliably detect NOXA by IHC, we regret that it is not feasible to experimentally address this question at this time.

      Additionally, we have carefully corrected the citation error involving Figure 7M-P on page 12, as pointed out by the reviewer.

      (9) Figure S1D. BCL-W levels were reduced upon WSB2 overexpression, which should be acknowledged.

      We sincerely thank the reviewer for raising this important issue. We acknowledge that BCL-W protein levels were slightly reduced upon WSB2 overexpression in Figure S1D. However, this effect is distinct from the pronounced reduction observed in NOXA protein levels. We have revised the manuscript to clarify this point. Additionally, we recognize that transient overexpression systems may occasionally lead to non-specific or artifactual changes. Our exogenous expression and co-immunoprecipitation experiments did not support an interaction between BCL-W and WSB2. Therefore, the observed reduction of BCL-W under these conditions may not reflect a physiologically relevant regulation.

      (10) Figure S4. Given WSB2 KO mice are viable; the authors may consider determining whether these mice are more sensitive to radiation-induced tissue damage or but more resistant to radiation-induced tumorigenesis?

      We sincerely thank the reviewer for this insightful and biologically meaningful suggestion. We agree that investigating the potential role of WSB2 in radiation-induced tissue damage and tumorigenesis would be of great interest. However, conducting such experiments requires access to specialized irradiation facilities, which are currently unavailable to us. Nevertheless, we recognize the value of this line of investigation and plan to explore it in our future studies.

      (11) All data were displayed as mean{plus minus}SD. However, for data from three independent experiments, it is more appropriate to present the results as mean{plus minus}SEM, not mean{plus minus}SD.

      We sincerely thank the reviewer for highlighting this important issue. In line with the reviewer's suggestion, we have revised the manuscript accordingly and now present data from three independent experiments as mean ± SEM.

      (12) The figure legends require careful review: i) The low dose of ABT-199 (Figure 6H) and the dose of ABT-199 used in Figure 6I are missing. ii) The legends for Figure S1D-E are incorrect. iii) The name of the antibody in the legend of Figure S3C is incorrect.

      We sincerely thank the reviewer for raising these important issues. We have carefully corrected all the errors mentioned. In addition, we have thoroughly reviewed the manuscript to prevent similar errors.

      Reviewer #2 (Recommendations For The Authors):

      The authors focus on NOXA, after initially identifying WSB2 to interact with several BCL2 proteins. The rationale behind this is that WSB2 depletion or overexpression affects NOXA levels, but none of the other BCL2 proteins tested, as stated in the text. Yet, BCLW is also depleted upon overexpression of WSB2 (Supplementary Figure 1). How does this phenomenon relate to the sensitization noted, is BCL-W higher in WSB2 KO cells? It does not seem so though. This warrants discussion.

      We appreciate the reviewer for raising this important issue. Our results showed that overexpression of WSB2 markedly reduced NOXA levels, while the levels of other BCL-2 family proteins remained unaffected or minimally affected, such as BCL-W (Figure 2—figure supplement 1A). Furthermore, depletion of WSB2 through shRNA-mediated KD or CRISPR/Cas9-mediated KO in C4-2B cells or Huh-7 cells led to a marked increase in the steady-state levels of endogenous NOXA, without affecting other BCL-2 family proteins examined, included BCL-W (Figure 2A-C, Figure 2—figure supplement 2A, B).

      If WSB2 depletion does not affect MCL1 levels, how does excess NOXA actually kill? Does it bind to any (other) prosurvival proteins under conditions of WSB2 depletion? Is the MCL1 half-life changed?

      We appreciate the reviewer for raising this important point. NOXA is a BH3-only protein known to promote apoptosis primarily by binding to and neutralizing anti-apoptotic BCL-2 family members, especially MCL-1, via its BH3 domain. It can inhibit MCL-1 either through competitive binding or by facilitating its ubiquitination and subsequent proteasomal degradation. In our system, the total protein levels of MCL-1 remained unchanged in WSB2 knockout cells, suggesting that NOXA may not be promoting apoptosis through enhanced MCL-1 degradation. Instead, we speculate that the accumulation of NOXA in WSB2-deficient cells enhances apoptosis by sequestering MCL-1 through direct binding, thereby freeing pro-apoptotic effectors such as BAK and BAX. In line with our observations, Nakao et al. reported that deletion of the mitochondrial E3 ligase MARCH5 led to a pronounced increase in NOXA expression, while leaving MCL-1 protein levels unchanged in leukemia cell lines (Leukemia. 2023 ;37:1028-1038., PMID: 36973350).

      Additionally, NOXA has been reported to interact with other anti-apoptotic proteins, including BCL-XL. It is therefore possible that under conditions of WSB2 depletion, excess NOXA may also bind to BCL-XL and relieve its inhibition of BAX/BAK, further contributing to apoptosis. Future experiments assessing NOXA binding partners in WSB2-deficient cells would help clarify this mechanism.

      I think some initial insights into the mechanism underlying the sensitization would add a lot to this study. Is there a role of BFL1/A1 in any of these cell lines, as it can also rather selectively bind to NOXA and is sometimes deregulated in cancer?

      We appreciate the reviewer for raising this important issue. While BFL1/A1 is indeed another anti-apoptotic BCL-2 family member that can selectively bind to NOXA and has been implicated in cancer, our study primarily focuses on the WSB2-NOXA axis. However, given its potential involvement in apoptosis regulation, it would be an interesting direction for future studies to explore whether BFL1/A1 contributes to NOXA-mediated sensitization in specific cellular contexts.

      Otherwise, this is a very nice and convincing study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor coreceptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco knockout. Furthermore, we did not find any developmental differences between wildtype and knockout caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orcoexpressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2024). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further clarify this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however we do believe that this would be an interesting avenue for further research.

      (5)Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of this manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety III greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. But indeed, we did not include ITCs, based on information from the EAG results in Figures 2A & 2B. Butterfly antennae did not respond strongly to ITCs, so we did not include ITCs in the larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as “important” by VIP scores in the chemical analyses. In the EAG results of Plutella xylostella (Liu et al., 2020), moths responded well to a few ITCs, the tested ITCs in our study are actually adopted from this study except for those that were not available to us. However, butterflies did not show a strong response to the tested ITCs; therefore, we did not include ITCs because we expected that Pieris brassicae caterpillars are not likely to show good responses to ITCs. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      Figure 3D shows that WT caterpillars prefer infested plants without parastioids to infested plants with parasitoids. In addition, we observed that caterpillars move frequently between different leaves. Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from (parasitoid-exposed) conspecifics via their spit or faeces to avoid parts of the plant potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plant parts that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it more clear.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract

      Line 24: "optimal food plant" should be changed to "optimal food plants"

      Thank you for the suggestion, we will revise it.

      (2) Introduction

      Lines 44-46: The sentence should be rephrased.

      Thank you for the suggestion, we will revise it.

      Line 50: "are" should be changed to "is".

      Thank you for the suggestion, we will revise it.

      Lines 57 and 58: Please provide the Latin names of "brown planthoppers" and "striped stem borer".

      Thank you for the suggestion, we will revise it.

      Line 85: "investigate the influence of odor-guided behavior by this primary herbivore on the next trophic levels"; similarly, Line 160: "investigate if caterpillars could locate the optimal host-plant when supplied with differently treated plants". These sentences are not very accurate in describing the relevant experiments. A: Thank you for the suggestion, we will revise them.

      Reviewer #2 (Recommendations for the authors):

      (1) L53 Remove the "the" from "Under the strong selection pressure"

      Thank you for the suggestion, we will revise it.

      (2) L80 I suggest adding a reference for the spitting behaviour, e.g. Muller et al 2003.

      Thank you for the suggestion, we will add it.

      (3) L89 establishing a homozygous KO insect colony.

      Thank you for the suggestion, we will revise it.

      (4) L107 perhaps this goes against the journal style but I always like to see acronyms explained the first time they are used.

      Thank you for the suggestion, we will try to make it more understandable.

      (5) L146-148 sentence difficult to read - consider rephrasing.

      Thank you for the suggestion, we will revise it.

      (6) L230 do you mean still produce? Rather than still reproduce?

      Thank you for the suggestion, we will revise it.

      (7) L233 missing an and before "a greater vulnerability to the parasitoid wasp".

      Thank you for pointing this out, we will revise it.

      (8) L238 malfunctional is a strange word choice.

      Thank you for pointing this out, we will revise it.

      (9) L181 - can the authors confirm that this lower survival was due to parasitism by the wasps?

      This question is similar to Q(7) of Reviewer 1, so we quote our answer for Q(7) here:

      When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasp (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (10) L474 - has it been tested if wasps still behave similarly after their ovipositor has been removed?

      Thank you for pointing out this issue. We did not strictly compare if disarmed and untreated wasps have similar behaviors. However, we did observe if disarmed wasps can actively move or fly after recovering from anesthesia before releasing into a cage, otherwise we would replace with another active one.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We have added a table as supplementary figure 3 that shows a comparison of all candidates. While there are differences in both proteomes, components such as ZO proteins and the endocytosis machinery are clearly conserved.

      No description of how mass spectrometry was done and what type of validation was done.

      We have contacted the mass spec facility we worked with and added a paragraph explaining the mass spec. procedure in the material and methods section.

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary. This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina). This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins.

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa. The background mainly originates from these two proteins. Therefore, it is easy to distinguish specific hits from non-specific background.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 151. There are many gap junctions at which Cx35b is not colocalized with Cx34.7.

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript. We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include a comprehensive summary of the results from the quantitative proteomics analyses, such as the number of proteins detected in each species and the number of proteins associated with each GO term. Additionally, a clear figure or table highlighting the specific proteins conserved between zebrafish and mice would strengthen the evidence for evolutionary conservation of proteins at electrical synapses.

      We have added the raw data we received from our mass spec facility including a comparison of all the candidates for different species. Supplementary figure 3.

      (2) A more detailed description of the number of experimental and/or technical replicates would improve the technical rigor of the study. For example, what was the rationale for using different log2 fold-change cutoffs in mice versus zebrafish? Are the replicates consistent in terms of protein enrichment?

      We have added raw data from individual experiments as a supplement (Excel spreadsheet). We have two replicates from zebrafish and two from mice. The first experiment in mice was conducted with fewer retinas and a different promoter (human synapsin promoter) and didn’t yield nearly as many candidates. We are currently running a third experiment with 35 mouse retinas which will most likely detect more candidates as we have identified currently. We can update the proteome in this paper once the analysis is complete. It is not feasible to conduct these experiments with multiple replicates at the same time, since the number of animals that have to be used is simply too high, especially since very specific genotypes are required that are difficult obtain.

      (3) It would be interesting to determine whether there are differences in the presence of candidate proteins between AII-AII gap junctions and AII-cone bipolar cell gap junctions. Given that the subcellular localization of AII-AII gap junctions differs from that of AII-cone bipolar cell gap junctions (with most AII-AII gap junctions located below AII-cone ones), histological validations of the proteins shown in Figure 6 can be repeated for AII-AII gap junctions. This would help reveal similarities or differences in the protein compositions of these two types of gap junctions.

      Thank you for this suggestion. We had similar plans. However, we realized that homologous gap junctions are difficult to recognize with GFP. The dense GFP labeling in the proximal IPL, where AII-AII gap junctions are formed, does not allow us to clearly trace the location of individual dendrites from different cells. Detecting AII-AII gap junctions would require intracellular dye Injections of neighboring AII cells. Unfortunately, we don’t have a set up that would allow this. Bipolar cell terminals, on the contrary, are a lot easier to detect with markers such as SCGN, which is why we decided to focus on AII/ONCB gap junctions.

      (4) In Figures 1 and 2, it would be helpful to clarify in the figure legends whether the proteins in the interaction networks represent all detected proteins or only those selected based on log2 fold-change or other criteria.

      Thank you for this suggestion! We have added a description in lines 643 and 662.

      (5) In Figure 1A (bottom panel), please include a negative control for the Neutravidin staining result from the non-labeling group.

      We only tested the biotinylation for wild type retinas in cell lysates and western blots as shown in figure 1C, which shows an entirely different biotinylation pattern.

      (6) In Figure 2B, please include the results of Neutravidin staining for both the labeling and non-labeling groups.

      Same comment: We see the differences in the biotinylation pattern on western blots, which is distinct for Cx36-EGFP and wild type retinas, although both genotypes were injected with the same AAV construct and the same dose of biotin. We hope that this provides sufficient evidence for the specificity of our approach.

      (7) In Figure 5B, the sizes of multiple proteins detected by Western blotting are inconsistent and confusing. For example, the size of Cx36 in the "FLAG-SJ2BP" panel differs from that in the other three panels. Additionally, in the "Myc-SIPA1L3+" panel, the size of SIPA1l3 appears different between the input and IP conditions.

      Thank you for pointing this out! The differences in the molecular weight can be explained by dimerization. We have indicated the position of the dimer and the monomer bands with arrows. Especially, when larger amounts of Cx36 are coprecipitated Cx36 preferentially occurs as a dimer. This can also be seen in our previous publication:

      S. Tetenborg et al., Regulation of Cx36 trafficking through the early secretory pathway by COPII cargo receptors and Grasp55. Cellular and Molecular Life Sciences 81, 1-17 (2024). Figure 1D

      The band that occurs above 150kDa in the SIPA1L3 input is most likely a non-specific product. The specific band for SIPA1L3 can be seen in the IP sample, which has the appropriate molecular weight. We often see much better immuno reactivity for the protein of interest in IP samples, because the protein is concentrated in these experiments which facilitates its detection.

      (8) How specific are the antibodies used for validating the proteins in this study? Given that many proteins, such as EPS15l1, HIP1R, SNAP91, GPrin1, SJ2BP, Syt4, show broad distribution in the IPL (Figure 3B, 4A, 6D), it is important to validate the specificity of these antibodies. Additionally, including negative controls in the histological validation would strengthen the reliability of the results.

      We carefully selected the antibodies based on western blot data, that confirmed that each antibody detected an antigen of appropriate size. Moreover, the distribution of the proteins mentioned is consistent with function of each protein described in the literature. EPS15L1 and GPrin1 for instance are both membrane-associated, which is evident in Hek cells. Figure 5C.

      A true negative control would require KO tissue and we don’t think that this is feasible at this point.

      (9) In Figure 7F, the model could be improved by highlighting which components may be conserved between zebrafish and mice, as well as which components are conserved between the AII-AII junction and AII-cone bipolar cell junction?

      Thank you for this suggestion. However, we don’t think that this is necessary as our study primarily focuses on the AII amacrine cell.

      Currently we are unable to distinguish differences in the composition of AII-AII and AII-ONCB junctions as described above.

      (10) Are there any functional measurements that could support the conclusion that "loss of Cx36 resulted in a quantitative defect in the formation of electrical synapse density complex"?

      The loss of electrical synapse density proteins is shown by these immunostaining comparisons. Functional measurements necessarily depend on the function of the electrical synapse itself, which is gone in the case of the Cx36 KO. It is not clear that a different functional measurement can be devised.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be very helpful if there were page and line numbers on the manuscript.

      Line and page numbers have been added.

      (2) Typos in the 3rd paragraph, the sentence 'which is triggered by the influx of Calcium though non-synaptic NMDA...'

      Should it read '... Calcium THROUGH non-synaptic NMDA'?

      We have corrected this typo.

      (3) Figure 1B: please add a description of the top panels, 'Cx36 S293'.

      A description of the top panels has been added to the figure legend in line. Line 639.

      (4) Figure 1C: what do the arrows indicate?

      We apologize for the confusion. The arrows in the western blot indicate the position of the Cx35-V5-TurboID construct, which can be detected with streptavidin-HRP and the V5 antibody. We have added a description for these arrows to the figure legend. See line 641.

      (5) Related to the point in the 'Weakness', there are some descriptions of how well some of the gap junction-associated proteins colocalize with Cx36 in immunostaining. For example, 'In comparison to the scaffold proteins, however, the colocalization of Cx36 with each of these endocytic components, was clearly less frequent and more heterogenous, which appears to reflect different stages in the life cycle of Cx36' and 'All of these proteins showed considerable colocalization with Cx36 in AII amacrine cell dendrites'. It would be nice to see quantification data to support these claims.

      Thank you for this suggestion. We have added a colocalization analysis to figure 3 (C & D). We quantified the colocalization for the endocytosis proteins Eps15l1 and Hip1r. This quantification included a flipped control to rule out random overlap. For both proteins we confirmed true colocalization (Figure 3D).

      (6) In Figure 5B, it would be helpful if there were arrows or some kind in western blottings to indicate which bands are supposed to be the targeted proteins.

      We have added arrows in IP samples to indicate bands representing the corresponding protein.

      (7) In the sentence including 'for the PBM of Cx36, as it is the case for ZO-1', what is PBM?

      The PBM means PDZ binding motif. We have added an explanation for this abbreviation in line 244.

      (8) Please add a description of the Cx35b promoter construct in the Method section.

      The Cx35b Promoter is a 6.5kb fragment. We will make the clone available via Addgene to ensure that all details of the clone can be accessed via snapgene or alternative software.

    1. Author response:

      Reviewer #1:

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      We thank the reviewer for this insightful comment. We agree that signals from the same neuron may be collected by adjacent channels. To address this concern in our software, we plan to add a routine to SpikeMAP that allows users to discard nearby channels where spike count correlations exceed a pre-determined threshold. Because there is no ground truth to map individual cells to specific channels on the hd-MEA, a statistical approach is warranted.

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      This is a valid concern. To ensure that firing rates are relatively constant over the duration of a recording, we will plot average spike rates using rolling windows of a fixed duration. We expect that population firing rates will remain relatively stable across the duration of recordings.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We agree that further cycles of experiments could be performed with SOM, VIP, and other neuronal subtypes, and we hope that researchers will take advantage of SpikeMAP too. We will clarify this possibility in the Discussion section of the manuscript.

      Reviewer #2:

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      We thank the reviewer for this comment. As detailed in Table 1, SpikeMAP is the only method that performs E/I sorting on large-scale multielectrodes, hence a comparison to competing methods is not currently possible. That being said, many of the pre-processing steps of SpikeMAP (Figure 1) involve methods that are already well-established in the literature and available under different packages. To highlight the contribution of our work and facilitate the adoption of SpikeMAP, we plan to provide a “modular” portion of SpikeMAP that is specialized in performing E/I sorting and can be added to the pipeline of other packages such as KiloSort more clearly.  This modularized version of the code will be shared freely along with the more complete version already available.

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      We agree with the reviewers that there are indeed similarities between our work and the Hilgen et al. paper. However, while the latter employs optogenetics to stimulate neurons on a large-scale array, their technique does not specifically target inhibitory (e.g., PV) neurons as described in our work. We will clarify our paper accordingly.

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      The title of our work will be edited to make it clear that while elements of the pipeline are well-established and available from other packages, we are the first to extend this pipeline to E/I sorting on large-scale arrays.

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution, might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer and will point out limits of the center-of-mass algorithm based on the article of Scopin et al (2024). Further, we will augment the existing code library to include monopolar triangulation or grid-based convolution as options available to end-users.

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We will clarify these points. Specifically, the value of 90kHz was chosen because it provided a reasonable temporal characterization of spikes; this value, however, can be adjusted within the software based on user preference.

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We will re-check Fig.2B which seems to have error in rendering, likely due to conversion from its original format.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      Here, the reviewer is suggesting that it may be better to perform PCA on several channels at once, since spikes can occur at several channels at the same time. To address this concern, small routine will be written allowing users to choose how many nearby channels to be selected for PCA.

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      It is true that k=2 is a pre-determined choice in our software. In practice, we found that k>2 leads to poorly defined clusters. However, we will ensure that this parameter can be adjusted in the software. Furthermore, if the user chooses not to pre-define this value, we will provide the option to use a Calinski-Harabasz criterion to select k.

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We share the reviewer’s concern and will add results that include a population of neurons to assess the robustness of this phenomenon.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      We applied stringent criteria to exclude cells, and we will revise the main text to be clear about these criteria, which include a minimum spike rate and the use of LDA to separate out PCA clusters. For the cells that were retained, we will include SNR estimates.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.       

      We will include a comparison of firing rates for E and I neurons. It is possible that I cells are located at the border of the MEA due to the site of injections of the viral vector, and not because of an anatomical clustering of I cells per se. We will clarify the text accordingly.

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      As mentioned previously, Kilosort and related approaches do not address the problem of E/I identification (see Table 1). However, they do have pre-processing steps in common with SpikeMAP. We will add some specific comparison points – for instance, the use of k-means and PCA (which is more common across packages) and the use of cubic spline interpolation (which is less common). Further, we will provide a stand-alone E/I sorting module that can be added to the pipeline of other packages, so that users can use this functionality without having to migrate their entire analysis.

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      We apologize for this issue. It seems there was a rendering problem when converting the figure from its original format. We will address this issue in the revised version of the manuscript.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      We will mention how many flashes/animals/slices were employed in the GT data and provide open access to these data.

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We thank the reviewer for the suggestion that SpikeMAP could be tested on artificially generated spike trains and will add the citation of the two papers mentioned. We hope future efforts will employ SpikeMAP on both synthetic and experimental data to explore the neural dynamics of E and I neurons in healthy and pathological circuits of the brain.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling.

      There is some direct evidence of reversible beta cell inactivation in rodent / in vitro models. We had already mentioned this in the discussion, but we have added some text emphasizing / clarifying the role of this evidence (lines 359–362).

      Others have also argued that some analyses of insulin treatment in conventional T2D, which has a stronger effect in patients with higher glucose before treatment, provides indirect evidence of reversal of glucotoxicity. We have also mentioned this in the revised paper (lines 284–285).

      For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a ”pre-KPD” condition? What is known about the disease’s timescale?

      The answers to some of these questions are not entirely clear—patients present with very high glucose, and thus must be treated immediately. Due to a lack of antecedent data it is not entirely clear what the pre-KPD condition is, but there is some evidence that KPD is at least not preceded by diabetes symptoms. This point is already noted in the introduction of the paper and Table 1. However, we have added a small note clarifying that this does not rule out mild hyperglycemia, as in prediabetes (and indeed, as our model might predict) (lines 76–77). Similarly, due to the necessity of immediate insulin treatment, it is not clear from existing data whether the disorder manifests more strongly in fasting glucose or glucose response, although it is likely in both. (We might infer this since continuous insulin treatment does not produce fasting hypoglycemia, and the complete lack of insulin response to glucose shortly after presentation should produce a strong effect in glycemic response.) We believe our existing description of KPD lists all of the relevant timescales, however we have also slightly clarified this description in response to the first referee’s comments (lines 66–73, 83)

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the ”honeymoon” phase of T1D. Making a strong case for the model in this scenario would be fascinating.

      This is an excellent idea, which had not occurred to us. We have briefly discussed this possibility in the remission (lines 281–291), but plan to analyze it in more detail in a future manuscript.

      Reviewer #1 (Recommendations for the author):

      Whenever simulation results are presented, parameter values should be specified right there in the figure captions.

      We have added the values of glucotoxicity parameters to the caption of Figure 2. In other figures, we have explicitly mentioned which panel of Figure 2 the parameters are taken from. Description of the non-glucotoxicity parameters is a bit cumbersome (there are a lot of them, but our model of fast dynamics is slightly different from Topp et al. so it does not suffice to simply say we took their parameters) so we have referred the reader to the Materials and Methods for those.

      I was confused by the language in Figure 4. Could the authors clarify whether they argue that: (1) the observed KPD behaviour is the result of the system switching from one stable state to another when perturbed with high glucose intake? (2) the observed KPD behaviour is the result of one of the steady states disappearing with high glucose intake?

      What we mean to say is that during a period of high sugar intake or exogeneous insulin treatment, one of the fixed points is temporarily removed—it is still a fixed point of the “normal” dynamics, but not a fixed point of the dynamics with the external condition added. Since when glucose (insulin) intake is high enough, only the low (high)-β fixed point is present, under one of these conditions the dynamics flow toward that fixed point. When the external influx of glucose/insulin is turned off, both fixed points are present again—but if the dynamics have moved sufficiently far during the external forcing, the fixed point they end up in will have switched from one fixed point to the other. We have edited the text to make this clearer (lines 153–185). Do note, however, that in response to both referee’s comments (see below), Figures 3 and 4 have been replaced with more illuminating ones. This specific point is now addressed by the new Figure 3.

      The adaptation of the prefactor ’c’ was confusing to me. I think I understood it in the end, but it sounded like, ”here’s a complication, but we don’t explain it because it doesn’t really matter”. I think the authors can explain this better (or potentially leave out the complication with ’c’ altogether?).

      Indeed, the existence of an adaptation mechanism is important for our overall picture of diabetes pathogenesis, but not for many of our analyses, which assume prediabetes. Nonetheless, we agree that the current explanation of it’s role is confusing because of its vagueness. We have elaborated the explanation of the type of dynamics we assume for c, adding an equation for its dynamics to the “Model” section of the Materials and methods, explained in lines 456–465. We have also amended Figure 1 to note this compensation.

      I expect the main impact of this work will be to get clinical practitioners and biomedical researchers interested in the intermediate timescale dynamics of β-cells and take seriously the possibility that reversible inactive states might exist. But this impact will only be achieved when the results are clearly and easily understandable by an audience that is not familiar with mathematical modelling. I personally found it difficult to understand what I was supposed to see in the figures at first glance. Yes, the subtle points are indeed explained in the figure captions, but it might be advantageous to make the points visually so clear that a caption is barely needed. For example, when claiming that a change in parameters leads to bistability, why not plot the steady state values as a function of that parameter instead of showing curves from which one has to infer a steady state?

      I would advise the authors to reconsider their visual presentation by, e.g., presenting the figures to clinical practitioners or biomedical researchers with just a caption title to test whether such an audience can decipher the point of the figure! This is of course merely a personal suggestion that the authors may decide to ignore. I am making this suggestion only because I believe in the quality of this work and that improving the clarity of the figures and the ease with which one can understand the main points would potentially lead to a much larger impact on the presented results.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader had to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. (These new figures are Fig. 3–5 in the revised manuscript.)

      Could the authors explicitly point out what could be learned from their work for the clinic? At the moment treatment consists of giving insulin to patients. If I understand correctly, nothing about the current treatment would change if the model is correct. Is there maybe something more subtle that could be relevant to devising an optimal treatment for KPD patients?

      This is another very good point. We have added a new figure (Fig. 7) in our results section showing how this model, or one like it, can be analyzed to suggest an insulin treatment schedule (once parameters for an individual patient can be measured), and added some discussion of this point (lines 224–240) as well as lifestyle changes our model might suggest for KPD patients to the discussion (lines 413–425).

      Similarly, could the authors explicitly point out how their model could be experimentally tested? For example, are the functions f(G) and g(G) experimentally accessible? Related to that, presumably the shape of those functions matters to reproduce the observed behaviour. Could the authors comment on that / analyze how reproducing the observed behaviour puts constraints on the shape of the used functions and chosen parameter values?

      g(G) has not been carefully measured in cellular data, however it could be in more quantative versions of existing experiments. Further, our model indeed requires some general features for the forms of f(G) and g(G) to produce KPD-like phenomena. We have added some comment on this to the discussion section of the revised manuscript (lines 367–372).

      Could the authors explicitly spell out which parameters they think differ between individual KPD patients, and which parameters differ between KPD patients and ’regular’ type 2 diabetics?

      In general we expect all parameters should vary both among KPD patients and between KPD / “conventional” T2D. The primary parameter determining whether KPD and conventional T2D, is seen, however, is the ratio kIN/kRE. We have elaborated on both these points in the revised mansuscript. (Lines 186–192, 250–257.)

      I was confused about the timescale of remission. At one point the authors write “KPD patients can often achieve partial remission: after a few weeks or months of treatment with insulin” but later the authors state that “the duration of the remission varies from 6 months to 10 years”.

      The former timescale is the typical timescale achieve remission. After remission is reached, however, it may or may not last—patients may experience a relapse, where their condition worsens and they again require insulin. We have edited the text to clarify this distinction (lines 66–73).

      When the authors talk about intermediate timescales in the main text could they specify an actual unit of time, such as days, weeks, or months as it would relate to the rate constants in their model for those transitions?

      We have done so (lines 86–87, figure 1 caption, figure 2 caption). Getting KPD-like behavior requires (at high glucose) the deactivation process to be somewhat faster than the reactivation process, so the relevant scales are between weeks (reactivation) and days (deactivation at high G).

      The authors state ”Our simple model of β-cell adaptation also neglects the known hyperglycemiainduced leftward shift in the insulin secretion curve f(G) in Eq. (2)) ”. This seems an important consideration. Could the authors comment on why they did not model this shift, and/or explicitly discuss how including it is expected to change the model dynamics?

      We agree that this process seems potentially relevant, as it seems to happen on a relatively fast timescale compared to glucose-induced β-cell death. It is, however, not so well characterized quantitatively that including it is a simple matter of putting in known values—we would be making assumptions that would complicate the interpretation of our results.

      It is clear that this effect will need to be considered when quanitatively modelling real patient data. However, it is also straightforward to argue that this effect by itself cannot produce KPD-like symptoms, and will only tend to reduce the rate of glucotoxocity necessary to produce bibstability. We have added a discussion of this in the revisions (lines 307–315). We have also, in general, expanded the discussion of the effects that each neglected detail we have mentioned is expected to have (lines 292–315).

      The authors end with a statement that their results may “contribute to explanation of other observations that involve rapid onset or remission of diabetes-like phenomena, such as during pregnancy or for patients on very low calorie diets.” Could the authors spell out exactly how their model potentially relates to these phenomena?

      Our thinking is that, even when another direct cause, such as loss of insulin resistance, is implicated in reversal of diabetes, some portion of the effect may be explained by reversal of glucotoxicity. This is indeed at this point just a hypothesis, but we have expanded on it briefly in the revision. (Lines 281–291.)

      Minor typos:

      In Figure 2.D the last zero of 200 on the axis was cut off.

      Line 359 - there is a missing word ”in the analysis”.

      We have fixed these typos, thanks.

      Reviewer #2 (Recommendations for the author):

      The manuscript could be significantly improved in two key areas: the presentation of the analysis, and the relation with experimental phenomenology.

      Regarding the analysis presentation, the figures could be substantially enhanced with minimal effort from the authors. At present, they are sparse, lack legends, and offer only basic analysis. The authors should consider presenting, for example, a bifurcation diagram for beta cell mass and fasting glucose levels as a function of kIN, and how insulin sensitivity and average meal intake modulate this relationship. The goal should be to present clear, testable predictions in an intuitive manner. Currently, the specific testable predictions of the model are unclear.

      The response to this question is copied from the reponses to related questions from the first referee.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader thad to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. We have also supplemented our phase diagram that shows the effects of SI and the total beta cell population with bifurcation diagrams showing β as SI and βTOT are varied. (These new figures are Fig. 3–5 in the present manuscript.) Finally, we have added another figure analyzing the model’s predictions for the optimal insulin treatment and the resulting time needed to achieve remission (Fig. 7)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Firstly, we would like to thank the reviewers for their time and efforts in critiquing this paper. The reviewers addressed our study to be significant, but also presented great suggestions to improve our manuscript, mainly the comparison of mRNA and eRNA for predicting subtype specificity and prognosis, the integration with independent validation datasets, etc. Our preliminary analyses showed that our classified mRNAs can predict subtypes better which is not surprising, as these subtypes were initially discovered using mRNA differences. Hence, we employed a novel approach of associating these classified mRNA and eRNA with distance and identified 71% classified eRNAs are associated with classified mRNAs. We also propose to integrate the datasets with PEGS (Briggs et al 2021) to achieve better mRNA-eRNA association and Perturb-seq validated regions to achieve functional validation of the eRNA loci. We believe that our potential improved integrative analyses will improve the novelty and power of our findings, as this is an unique approach which is employed in patient samples-based high resolution eRNA atlas for the first time. We have addressed most of the other major and minor comments of the reviewers and have provided the preliminary revised manuscript.

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary<br /> This study assesses eRNA activity as a classifier of different subtypes of breast cancer and as a prognosis tool. The authors take advantage of previously published RNA-seq data from human breast cancer samples and assess it more deeply, considering the cancer subtype of the patient. They then apply two machine learning approaches to find which eRNAs can classify the different breast cancer subtypes. While they do not find any eRNA that helps distinguish ductal vs. lobular breast cancers, their approach helps identify eRNAs that distinguish luminal A, B, basal and Her2+ cancers. They also use motif enrichment analysis and ChIP-seq datasets to characterize the eRNA regions further. Through this analysis, they observe that those eRNAs where ER binds strongest are associated with a poor patient prognosis.

      Major comments:

      Part of the rationale for this study is the previous observation that eRNAs are less associated with the prognosis of breast cancer patients in comparison to mRNAs and they claim that the high heterogeneity between breast cancer subtypes would mask the importance of eRNAs. In this study, the authors solely focus on eRNAs as a classification of breast cancer subtypes and prognostic tool and do not answer whether eRNAs or mRNAs are a better predictor of cancer subtypes and of prognosis. Since the answer and the tools are already in their hands, it would be important to also see a comparative analysis where they assess which of the two (mRNAs or eRNAs) is a better predictor.

      Response: We appreciate the reviewer for this valid point about comparing the prognostic eRNAs vs mRNAs. Our study doesn’t imply that eRNA markers are better than mRNAs in predicting subtype specificity and/or prognosis, but our motivation for working with eRNAs is that they can be used to define relevant transcriptional regulators and prognosis generally if they are subtyped. As the molecular subtypes in breast cancers were established using gene expression datasets, mRNAs would perform better as predictors of subtypes and or prognosis. However, identifying regulatory networks with emphasis on transcription factor binding motif analyses is not achievable using mRNA datasets. Analysing the active enhancer regions with eRNA transcription will provide high resolution landscape of TF and epigenetic networks. These sorts of analyses usually require ATAC-seq or H3K27ac datasets, but these assays need fresh frozen tissue material and laborious experimental designs compared to RNA-seq datasets. Furthermore, eRNA-transcribing enhancers represent highly active enhancers, while ATAC and H3K27ac datasets can identify all enhancers, which can be inactive or poised, but captured due to the dynamic nature of enhancers. We demonstrate that traditional RNA-seq datasets mapped on active enhancer regions showing eRNA transcription would be sufficient to identify the highly active TF network and gene-enhancer regulatory frameworks in a subtype-specific manner, hence emphasising the potential of eRNA studies.

      Hence, the scope of our study is not to establish which RNA can predict subtype and survival, but to demonstrate the potential of studying eRNAs in patient samples using traditional RNA-seq assays. This study would be beneficial for epigenetics biologists of how enhancer transcription can be associated with gene regulation through deregulated transcription factor networks in patients. The above section had been included in the discussion in the revised manuscript.

      As the comparative analyses suggested by the reviewer will substantiate the potential of eRNAs being studied as cancer prognostic markers, we performed identical methodologies with our machine learning approaches on the published TCGA mRNA-seq datasets, identify the subtype-specific mRNAs as well as prognostic mRNAs and perform the comparative analyses of eRNAs and mRNAs. As we expected, mRNAs indeed perform better in associating with subtype specificity than eRNAs as we could identify more subtype-specific mRNAs with better statistics metrics. The results exhibit great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular.

      We believe that eRNA and mRNA are complementary but not comparative to predict subtype-specific survival. To address this in the revised manuscript, we performed an initial selection of the eRNAs associated with their corresponding subtype-specific mRNAs within 50 kb distance which can be integrated with the above analyses, based on the suggestion from reviewer 3. In our preliminary analysis, around 71% of eRNAs are associated with the subtype-specific mRNAs and we also observed an observable separation of ductal and lobular subtypes using this method.

      Furthermore, we integrated our enhancer RNAs with the key enhancer regions which show significant impact on gene transcription, as shown in single cell CRISPRi screens (Perturb-seq) datasets derived from ATAC-matched H3K27ac datasets verified on one ER+ and one ER- breast cancer cell lines (Wang et al., Genome Biology 2025, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03474-0) . Our initial analyses identified at least 29 regions from the Perturb-seq datasets overlapping with 72 and 5 eRNAs of subtype classification and Her2 survival respectively.

      For the revised manuscript, we will perform the mRNA-eRNA association in a detailed manner and include the data. We will also employ our well-established tool for associating mRNAs and noncoding elements, Peak set Enrichment in Gene Sets (PEGS, Briggs et al., F1000 research, 2021 https://f1000research.com/articles/10-570/v2 ). We hypothesise that this will improve the power of the classification models used in the study and will also provide gene-enhancer RNA interaction landscape in patient samples for the first time. Furthermore, we will integrate the activity of these eRNA-mRNA pairs with chromatin accessibility and enhancer activity using ATAC-seq and H3K27ac ChIP-seq datasets to establish more robust active regulatory networks in patient samples. We will also perform motif analyses on the published ATAC-seq peaks (performed on TCGA-BRCA patient samples, Corces et al., 2018) close to the eRNA loci to identify the TF networks with better precision, hopefully unravelling novel and relevant subtype-specific TFs in an efficient manner, better than our original work. Furthermore, as an experimental functional validation of our classified eRNAs, we will investigate the regulatory effect of 29 Perturb-seq overlapped regions. Hence, our revised manuscript will potentially provide a comprehensive validated list of enhancer RNA regions which are highly active, actively transcribing, subtype and survival specific regulatory networks in breast cancer patients for the first time.

      The authors run the umaps of Fig. 1C only taking the predictor eRNAs. It is then somewhat expected to observe a separation. Coming from a single-cell omics field, what I would suggest is to take the eRNA loci and compute a umap with the highly variable regions, perform clustering on it and assess how the cancer subtypes are structured within the data. This would give a first overview of how much segregation and structure one can have with this data. Having a first step of data exploration would also strengthen the paper. If the authors have tried it, could the authors comment on it?

      Response: We appreciate the reviewer for sharing their experience from single cell omics analysis. In our case, following the scRNA like pipeline is not appropriate, given the focus of our study on identifying markers on the already annotated subtypes. Basically, we aim to assess the quality of the identified markers (the quality is quantified by the statistics provided for random forest classification), and we see that the data is well-separated in PCA using only PC1 and PC2. We showed the umap (using PC1 and PC2) for better visualization in the original manuscript and we included the PCA plots in the revised manuscript.

      'neither measures could classify any distinct eRNAs for invasive ductal vs lobular cancer samples' S1B. Just by eye, I can see a potential enrichment of ductal on the left and on the right while lobular stays in the center. This suggests to me that, while perhaps each eRNA alone does not have the power to classify the lobular vs ductal subtype, perhaps there is a difference - which could result from a cooperative model of eRNA influence - that would need further exploration. Would a PCA also show enrichments of ductal vs. lobular in specific parts of the plot? It may be worth exploring the PC loadings to see which eRNAs could play an influence. In this regard, a more unbiased visual examination, as suggested in my previous point, could help clarify whether there could be an association of certain eRNAs that cannot be captured by ML.

      Response: The subtypes of cancer patients (Basal, Her2, LumA/B) possess clear differences in mRNA expression in breast cancer studies. Given the fixed annotations of the subtypes in the patient datasets, we applied our methodologies on mRNA datasets, and the results exhibited great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular. In addition, 70% of subtype-specific eRNAs are located next to mRNA. This ensures that we detected proper eRNA markers. Furthermore, Random Forest is the standard and powerful non-linear classifier for these types of classifying questions. Therefore, we hypothesized that the data which can distinguish Ductal vs Lobular does not exist in the used eRNA dataset. We only detected 38 subtype-specific mRNAs using information gain with standard cutoff 0.05 which they have classifying power across ductal-lobular. With this standard cutoff only one eRNA-associated gene was detected. To explore more, we used low cutoff for information gain (0.01) and then took only the eRNAs which are located near classified mRNAs (up to 50KB). In this way, we detected 96 eRNA candidates linked to 8 classified mRNAs. These 96 eRNAs could, to some extent, classify ductal vs lobular (PCA plots attached above). This observation can further verify that if a more comprehensive eRNA dataset exists, we could detect better eRNA markers and cover more (probably all) mRNA markers. Hence, cooperative model of eRNA as suggested by the reviewer can't be achieved and random forest is one of the efficient tools to decipher the cooperation if it exists. Besides, as we demonstrated in this paper that eRNA is a complementary dataset to mRNA which can assist in the identification of regulatory networks. For the revision, we will provide more detailed eRNA-mRNA associations using integration with PEGS and Perturb-seq validated regions, in both subtype classification and survival and will motivate the potential similar studies for ductal vs lobular in the discussion.

      "we employed machine learning approaches on 302,951 eRNA loci identified from RNA-seq datasets from 1,095 breast cancer patient samples from previous studies" - the previous studies from which the authors take the data [11,12] highlight the presence of ~60K enhancers in the human genome and they use less than that in their analysis. Could the authors please clarify the differences in numbers with previous studies and give a reasoning?

      Response: ~300K enhancers are derived from ENCODE H3K27ac datasets which represents all active enhancer regions marked by H3K27ac (Hnisz et al., 2013). This is a high-resolution map of eRNA loci ever presented. In Chen et al 2020, 1,531 superenhancers representing 30K eRNA loci was utilised for exploratory analysis, and the findings were generalised back to the 300K set. 65K enhancer loci covers tissue-specific enhancers initially identified by FANTOM CAGE datasets and this subset provide limited regions of eRNA expression. Hence, our analyses on ~300K eRNA loci provide unbiased information on subtype specificity and gene-TF regulatory networks. The differences had been highlighted in the methods and results in the revised manuscript.

      Also, from the methods section, they discard many patient samples due to low QC, so, from what I understand, the number of samples analyzed in the end is 975 and not 1,095.

      Response: We thank the reviewer for pointing this out and we have updated the numbers in the revised manuscript.

      Minor comments:

      Can the authors please state the parameters of the umap in methods? Although it could be intrinsic to the dataset, data points are grouped in a way that makes me think that the granularity is too forced. Could the authors please show how the umap would behave with more lenient parameters? Or even with PCA?

      Response: We used ‘umap’ function from umap package (with default parameters) in R using only PC1 and PC2, hence the granularity is not forced. As suggested by the reviewer, we have now added PCA plots in the main figures (Fig. 1E) and moved all the umap plots to the Supplementary figures (Fig.S1B) in the revised manuscript.

      'Majority of the basal' -> The majority of the basal.

      Response: We thank the reviewers for noticing the typo and we corrected this in the revised manuscript.

      Significance

      This is a paper relevant in the cancer field, particularly for breast cancer research. The significance of the paper lies in digging into the breast cancer samples, taking the different existing subtypes into account to assess the contribution of eRNAs as a classifier and as a prognostic tool. The data is already available but it has not been studied to this degree of detail. It highlights the importance of characterizing cancer samples in more depth, considering its intrinsic heterogeneity, as averaging across different subtypes would mask biology. My expertise lies in gene regulation and single-cell omics. My contribution will therefore be more focused on the analysis and extraction of biological information. The extent of its specific relevance in cancer research falls beyond my expertise.

      Response: We appreciate the reviewer for understanding our efforts to bring out the importance of subtyping and to explore the association of eRNA in breast cancer transcriptional gene regulatory networks.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary<br /> Enhancer RNAs (eRNAs) are early indicators of transcription factor (TF) activity and can identify distinct molecular subtypes and pathological outcomes in breast cancer. In this study, Patel et al. analysed 302,951 polyadenylated eRNA loci from 1,095 breast cancer patients using RNA-seq data, applying machine learning (ML) to classify eRNAs associated with specific molecular subtypes and survival. They discovered subtype-specific eRNAs that implicate both established and novel regulatory pathways and TFs, as well as prognostic eRNAs -specifically, LumA and HER2-survival- that distinguish favorable from poor survival outcomes. Overall, this ML-based approach illustrates how eRNAs reveal the molecular grammar and pathological implications underlying breast cancer heterogeneity.

      Major comments

      1. The authors define 302,951 eRNA loci based on RNA-seq data, yet it is widely known that many enhancers reside in proximity to promoters or within intronic regions (examples presented in Fig. 3B and S3). Consequently, it seems likely that reads mapped to these regions might not truly represent eRNA signals but include mRNA contamination. Could the authors clarify how they ensured that the identified eRNAs were not confounded by mRNA reads? What fraction of these enhancer loci is promoter proximal or intronic? How does H3K4me3, a well-established and standardized active promoter histone mark, behave on these loci? The reviewer considers it important to confirm that the identified eRNAs are indeed of enhancer origin rather than promoter transcripts.

      Response: For this study, we utilised pan cancer atlas-based published work (Chen et al 2018 and 2020) where the abundant RNA signals on intronic and intergenic regions are included, and promoter-based signals are excluded. These studies utilise the advantage of identifying eRNAs on large sample size and the possibility of mRNA being on introns in 1000s of patient samples is very low. A clarification of this concern had been discussed in the Introduction of these studies as follows: “because eRNA reads associated with real enhancer activity recurrently accumulate, whereas background transcription noise tends to occur stochastically. The large number of RNA-seq reads obtained would compensate for the statistical power compromised by the low eRNA expression level typically observed in a single sample.” We included clarification of this concern in the discussion. Furthermore, as per the reviewer’s suggestion, we examined the distribution of the eRNA loci across the genome and found that majority of eRNA regions are located on introns and intergenic regions. This figure had been included in the Supplementary Fig. S6A.

      2. In Fig. 1B, the F measure (0.540) of the Basal subtype using the Logmc method contradicts its extremely high precision (1.000) and sensitivity (0.890). The authors need to clarify the exact formula or method used to compute F1 and the discrepancy in the reported metrics for this subtype and perhaps other subtypes as well.

      Response: We apologise for the mistake in this section and thank the reviewer for pointing this out. We included the formulas for each statistical metric in the method section of the manuscript. The F-measure was mentioned wrong which led to the confusion here. The figure had been corrected with the F-measure of 0.94 in the revised manuscript.

      3. As shown in Fig. 4C, S4B, and most, if not all, tracks of Fig. S3, ER binding regions are not annotated as eRNA loci. It seems, in this reviewer's opinion, very unlikely that this is because they generally lack eRNA expression, but rather they do not express polyadenylated eRNA (typically 1D eRNA), which is captured in this dataset. The reviewer posits that these enhancers produce more transient, non-polyadenylated 2D eRNA. It has been widely documented in prior studies that ER-bound enhancers exhibit bimodal eRNA expression patterns [e.g., Li, W. et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516-520 (2013)]. Could the authors address this opinion and elaborate on how the restriction to polyadenylated transcripts might underrepresent enhancers regulated by ER and other TFs and whether this bias impacts the overall findings?

      Response: The authors appreciate the reviewer’s suggestion to address the caveats of using polyadenylated eRNAs to identify the ER binding patterns. TCGA eRNA atlas with polyadenylated eRNAs indeed possesses this disadvantage of using polyadenylated eRNAs for this study, however currently there are no data available with bidirectional transcripts in any breast cancer patient samples. The tools to profile these RNAs are not robust enough to be performed on frozen cancer tissue samples which are extremely limited in their size and availability. By utilising the polyadenylated eRNA-seq datasets, we might not only lose the accuracy of ER binding patterns, but also for other transcription factors which activate/associate with bimodal expression around enhancers. However, our integrative analysis on stable polyadenylated eRNA loci can still identify the most-relevant TF networks of each subtype.

      Furthermore, we validated this finding by analysing our own datasets of KAS-seq which represents any active transcribing bidirectional enhancers from MCF7 cell line. Independently, we also incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to associate the eRNA regions identified in polyadenylated RNA datasets with ER binding sites in patients and published bidirectional transcripts in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions. Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess both active bidirectional transcribing ER bound sites far away (around 1.5 kb) from subtype-specific eRNA loci and bidirectional transcribing ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions. Hence, it can be true that the occupancy of ER might not be present on all subtype-specific eRNA loci, but our subtype-specific eRNA sites are representative of bidirectional transcription.

      Upon the suggestion from the reviewer, we discussed the potential of identifying TF networks by analysing the 1D eRNAs, in the revised manuscript.

      4. Despite the unsatisfied performance of the ML approach on classifying Her2 subtypes, the hierarchical clustering performed in Fig. 2A and S2A appears to show a reasonable separation of Her2 subtypes, showing as a clustered green band. Could the authors quantitatively assess how effective this clustering results and compare that to the ML outcome? (OPTIONAL)

      Response: The authors acknowledge this interpretation from the reviewers. Using both the measures, our ML platform can identify markers for Her2 subtype but some of the statistical metrics are poor. As the heatmaps were performed based on these identified Her2 markers, a separate analysis on this cluster would not be much informative. The poor metrics for Her2 classification was already justified, partly due to the low number of Her2+ patients in the cohort.

      5. In Fig. 4 and S4, the authors reported to have enriched binding or motif of TFs, e.g., FOXA1, AP-2, and E2A, specifically at enhancer loci with low eRNA level, which conflicts with their established roles as transcriptional activators. The reviewer asks for an address as to why these factors would be associated with basal low-eRNA regions and whether any additional data might clarify their functional role in these contexts.

      Response: The authors appreciate the reviewer’s concern, but we would like to clarify that eRNAs which are less expressed in basal subtype are classified as basal low. These regions show high expression in luminal patients. Hence, there is a strong overlap of basal low and luminal high regions. FOXA1 and AP2 factors are strongly established coactivators in luminal ER+ transcriptional signaling, hence they are associated with basal low eRNA regions. We clarified this in the discussion and provided more literature evidence in the revised manuscript to demonstrate the strong role of FOXA1 and AP2 factors in ER+ luminal breast cancer transcriptional response.

      6. Regarding Fig. 4B, the authors state that "ER binding occupies only the strongest ssDNA and GRO-seq-positive sites". Firstly, the GRO-seq data quality is poor with indiscernible peaks. This may be insufficient for a qualified representation of nascent eRNA expression. More importantly, it appears each heatmap is ranked independently, so top loci for ssDNA are not necessarily top loci for GRO-seq, ER, Pol-II, or H3K27ac. The reviewer requests clarification on how the authors plot these heatmaps and questions whether the statement is supported by the analysis as presented.

      Response: We acknowledge the reviewer’s concern and based on their suggestion, we utilised another set of GRO-seq datasets which is more deeply sequenced and published by the same lab. The average plot from these new datasets showed better profile. We also apologize for not providing enough details of how we generated the heatmaps in Fig. 4B. The heatmaps were made separately for each profile to auto scale with their own intensity levels but the order of the regions is based on KAS-seq intensity. The order of these regions was kept the same between each profile. Hence, top loci of ssDNA are not exact top loci of GRO, ER, H3K27ac and Polymerase but top loci of ssDNA also show similar high intensity in GRO, ER, H3K27ac and Polymerase, hence correlated. We also removed regions which belong to blacklisted regions of hg38 and the regions which were over-sequenced due to amplifications and showed weird signals. We provided the new heatmaps and profile plots in the revised manuscript with different clusters of KAS-seq intensity. We also updated the methods section to clarify how these heatmaps were made.

      7. In Fig. S4B and the third plot of 4C, the averaged histogram of ER binding appears in multiple sharp peaks with drastic asymmetric positioning around the enhancer centre, which is highly atypical of most published ER ChIP-seq profiles. Could the authors discuss possible "spatial syntax" or directional patterns of ER binding in relation to eRNA loci and cite any literature showing a similar pattern? Further evidence is required to substantiate these observations, as they are remarkably unique.

      Response: The authors agree with the reviewer’s point about asymmetric peaks of ER on the luminal specific eRNA regions. Due to the nature of the average profile plots and the number of regions explored here are so low, the profiles look asymmetrical and different than the published literature. Heatmaps lose their resolution when made on a very low number of regions. The focus of this analysis is to highlight that the ER is not binding to the centre of eRNA loci which is contradictory to the published findings from in vitro studies, but further away on these subtype-specific regions. We don’t have any solid evidence to demonstrate the directional patterns of ER binding related to this data. To avoid any confusion, we removed these average plots but focused on the already existing single gene profiles in Fig. S3 and discussed our interpretations in detail.

      Minor comments<br /> 1. When introducing eRNAs, the reviewer recommends mentioning that 1) eRNA levels correlate with enhancer activity and 2) eRNA expression precedes target gene transcription, thus reflecting upstream regulatory events. Relevant references include: Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010-1014 (2015); Carullo, N. V. N. et al. Enhancer RNAs predict enhancer-gene regulatory links and are critical for enhancer function in neuronal systems. Nucleic Acids Res. 48, 9550-9570 (2020); Kaikkonen, Minna U. et al. Remodeling of the Enhancer Landscape during Macrophage Activation Is Coupled to Enhancer Transcription. Mol. Cell 51, 310-325 (2013).

      Response: These are great recommendations from the reviewer, and we included the suggested publications in the Introduction section of the revised manuscript.

      2. H3K27ac is used initially to define these regulatory loci, and like eRNAs, H3K27ac also varies among patients. Which H3K27ac dataset(s) were used initially, and could this approach potentially overlook patient-specific enhancers? (OPTIONAL)

      Response: This is a totally valid point from the reviewer. The idea of this project is to define common subtype-specific enhancers which can be regulatory and prognostic, hence can be developed further as biomarkers providing benefit for more patients in the future. Hence, investigating the common enhancers which are activated in multiple normal and cancer cell lines defined by ENCODE is more valid than patient-specific enhancers whose activity might be influenced by specific genetic alterations. There is very limited availability of H3K27ac ChIP-seq datasets from cancer patients to explore the patient-specific enhancers, and our analyses were totally based on the published work, hence not possible to fully address this concern. The source of the H3K27ac ENCODE datasets (from 86 human cell lines and tissue samples) is clarified in the revised manuscript.

      3. In addition to the overall metrics displayed in Fig. 2B, could the authors provide precision and sensitivity values for LumA and LumB separately under the Logmc method, given the observation in Fig. 2E that LumA and LumB are not well separated in the UMAP projection?

      Response: The authors appreciate the suggestion from the reviewer. We have included the metrics separately for LumA and LumB in the revised manuscript in Fig. S1D.

      4. Could the author elaborate, in the discussion section, on why there is a substantial difference in ML performance depending on whether InfoGain or Logmc is used?

      Response: We have included the following text in the discussion to explain the differences between these two measures.

      “InfoGain measure work with the approach of binarization with k-means (k=2). It has the potential to capture both strongly expressed eRNAs which are differential between subtypes as well as low expressed sparser on and off eRNAs. In the first case, although eRNA is highly expressed in all patients, the higher expression mode becomes 1 and the lower expressed mode become 0. However, in case of low expression, more on and off expression, recentered logmc would not generate a striking high value. Furthermore, binarization is also a strong process to perform better clustering and classification, as distinguishing between data points gets better and clearer. “

      5. How does the expression pattern of Basal high, Basal low, Her2, and Lum eRNA clusters behave differentially in Basal, Her2, and LumA/B subtypes? Are Basal high eRNAs downregulated in Her2 or Lum subtypes, and vice versa? Since many downstream analyses rely on these eRNA clusters, it is suggested to include a heatmap and/or boxplot that displays how each eRNA category is expressed in each subtype to confirm that these definitions are consistent.

      Response: We thank the reviewers for this suggestion and apologise for not providing enough clarification on the expression of eRNAs in other subtypes. Indeed, Basal high expressed eRNA are expressed low in LumA and LumB and Basal low expressed eRNAs are expressed higher in lumA and lumB. Her2 subtype-specific eRNAs has a trend of expression between Basal and Lum, as it can be seen in the umap and PCA. Basically, the Basal high expressed eRNAs are Lum lower expressed eRNAs, and the Basal low expressed markers are Lum higher expressed markers. As per the suggestion from the reviewer, we provided heatmaps on eRNA expression of each subtype-specific with regulation in other subtype patients in figure S2F-K.

      Referee cross-commenting

      I share Reviewer #1's opinion that the manuscript should assess whether mRNA or eRNA is the stronger predictor of breast cancer subtypes and clinical outcomes. It will greatly improve the novelty if eRNA is shown to be a better indicator for cancer characterization.

      Also, I strongly concur with Reviewer #3 that the current informatics approach is superficial and that several conclusions are contentious. The authors need to resolve the inconsistencies in their ML statistics and the potentially misleading interpretations of the ChIPseq and motif enrichment results.

      It is further recommended that, building Reviewer #3's comment, the study integrate eRNA signatures with their proximal genes to address 1) whether genes located near these enhancers are differentially expressed-and correlated with enhancer activity-across cancer subtypes, and 2) whether it provides insights into understanding the enhancer-gene regulatory architecture in a subtype-specific context.

      Response: We thank reviewer 2 for cross-commenting on reviewer 1 and 3’s suggestions. Indeed, these are interesting points to cover and will increase the novelty of the study. Based upon these suggestions and discussed earlier for reviewer 1’s comments, we will explore the comparison of mRNAs vs eRNAs as predictor of cancer subtypes and prognosis and the association of genes-eRNAs in cis as discussed in other reviewer’s comments. Our preliminary analyses show a strong association of eRNA and mRNA specific to subtypes and an observable separation on subtypes which were harder to classify markers using eRNAs alone. Hence, we will improve these analyses, and the manuscript further as discussed above in the final revision.

      Significance

      General Assessment

      This study provides insights into the potential use of eRNA to classify breast cancer subtypes and refine prognostic markers. A strength is the integration of large-scale RNA-seq data with machine learning to identify eRNA signatures in biologically-meaningful patient samples, revealing both established and novel TF networks. The study also discovered eRNA clusters that correlate with the survival of patients, thus providing strong clinical implications. However, the ML approach yields several inconsistencies-for instance, unsatisfactory classification results for the Her2 subtype as well as the confused statistical metrics in the results. Furthermore, the ML model struggles to differentiate more nuanced molecular classes (e.g., LumA vs. LumB) and higher-level histological subtypes (e.g., lobular vs. ductal), thus limiting its power to dissect more delicate pathological and molecular mechanisms. Another limitation worth noting of this ML approach is the exclusive use of only polyadenylated eRNAs via RNA-seq, which excludes perhaps the more prominent 2D eRNA expressed in regulatory enhancers. Moreover, certain datasets appear to be of suboptimal quality, leading to assertions that would benefit from additional supporting evidence. Altogether, while the study offers a promising angle on eRNA-based tumor stratification, more robust experimental validations are needed to resolve inconsistencies and clarify the mechanistic underpinnings.

      Advance<br /> Conceptually, the study highlights the potential for eRNA-based signatures to capture regulatory variation beyond classical markers. However, the utility of these signatures is constrained by the focus on polyadenylated transcripts alone, likely underrepresenting key enhancer regions, and certain evidence presented in this study is not substantial enough to support some statements. While the work adds an important dimension to the understanding of enhancer biology in breast cancer, the resulting insights are partly hampered by limitations in data coverage and quality.

      Audience<br /> The primary audience includes cancer epigenetics, functional genomics, and bioinformatics researchers who are interested in leveraging eRNAs as biomarkers and dissecting complex regulatory networks in breast cancer. Clinically oriented scientists focusing on molecular diagnostics may also find relevance in the authors' approach to stratify subtypes and outcomes. The research is most relevant to a specialized audience within basic and translational cancer genomics, as well as computational biology groups interested in eRNA analysis.

      Field of Expertise

      I evaluate this manuscript as a researcher specializing in cancer epigenetics, functional genomics, and NGS-based data analysis. Parts of the manuscript touching on clinical outcome measures may require additional review from practicing oncologists.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study aims to classify prognostic and subtype-specific eRNAs in breast cancer, highlighting their potential as biomarkers.<br /> Data was analysed using existing machine learning algorithms,<br /> Data analysis is superficial and it is hard to understand the key significant findings.

      This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).

      Weaknesses include: Most of the findings are purely correlation-based and also based on a reanalysis of published datasets; it would benefit from experimental validation to support their findings. Differential expression analysis of large datasets likely yields some differences in the transcriptome. How significant are these changes?<br /> Does the expression of eRNAs affect the expression of genes in cis? Although this analysis would provide some associated gene expression differences, it can also provide some insights into subtype-specific differences in gene expression programs.<br /> If the authors find experimental validations are not feasible, I recommend validating the eRNA signature in an independent dataset.

      Response: We acknowledge the weaknesses noticed by the reviewer from this study about the correlation-based analyses of published datasets. While the TCGA eRNA atlas datasets are reanalysed, these are the high-resolution maps ever published on eRNA expression on cancer patient samples, and our study is the first to establish the subtype specific classification of eRNAs. We believe that the eRNAs are biologically relevant, as they are strongly associated with the subtype-specific pathways and epigenetic regulators. Upon suggestion from the reviewers, we will explore the association of mRNAs and eRNAs in cis to establish further significance and relevance of the eRNAs we identified (discussed earlier in reviewer 1 comments).

      We would like to focus on studying the functional relevance of eRNAs as a separate project. In vitro studies to establish the knockdown of eRNAs are not straightforward due to the toxicity and non-specific targeting of the locked nucleic acids approach or Cas13-based RNA targeting. siRNA-based approaches don't target the nuclear eRNAs effectively, even though they were widely used by other labs to target eRNAs. Hence, a lot of effort on optimisations are needed to establish functional validation of our eRNAs, hence not under the scope and time frame of this study/revision. To provide validation and significance using independent datasets, we will explore the association of these factors with the expression of subtype-specific eRNAs further in our final revised manuscript using the tools explained above for reviewer 1 (PEGS and Perturb-seq integration). Integration of our classified eRNAs with the published Perturb-seq validated regions from ER+ and ER- breast cancer cell lines will provide the functional validation of patient-associated classified enhancer/eRNAs. Hence, our study would be the first to demonstrate the validated gene-enhancer regulatory networks from breast cancer patient datasets.

      Furthermore, we included the single gene visualisation profiles of independent datasets of ER ChIP-seq from different patients (Ross-Innes et al., 2012), ATAC-seq from TCGA patients (Corces et al., 2018), H3K27ac ChIP-seq datasets from cell lines (Theodorou et al., 2013 and Hickey et al., 2021) and GRO-seq and CAGE data published in MCF7 cells close to the eRNA regions and discussed their overlap with the eRNA regions in the revised manuscript. In the final revision, we will perform further detailed integration of all these profiles. Overall, our study will provide the integratory analysis of various independent epigenetic and functional profiles to validate our classified subtype and survival-specific eRNA regions.

      Here are major points; addressing these points in the revised version is important.

      From Figure 1B, what eRNAs were identified for LumB using log2MC?

      Response: The authors acknowledge the lack of analyses on LumB eRNAs in the original version of the manuscript. In the final revised manuscript after associating with mRNAs, we will provide the heatmaps, pathway analyses and other functional annotations for LumB specific eRNAs.

      Page 8 However, sensitivity and F-measure .... It would help to include the metrics for the number of patients in each subtype. The ratio of eRNAs/number of cases in each subtype would inform if the number of eRNAs is an outcome of no. of cases or subgroup-specific.

      Response: This is a great suggestion from the reviewer, and we included the number of patients for each subtype in the table in Fig. 1D. We observed that the basal patients are low in number, but we identified more basal eRNAs. Hence, the number of eRNAs identified in subtype-specific manner is not correlated to the number of patients in the cohort.

      Page 9 "Altogether, both measurements classify eRNAs efficiently based on subtypes, InfoGain allowed us to distinguish further samples based on high and low expression of eRNAs for basal subtype and performed better in statistical metrics" Based on statistical metrics, both models seem to be performing similarly except for Her2.

      Response: We apologise for this wrong interpretation. We corrected this in the revised manuscript at page 9.

      In Fig. 1B, the F-measure metrics are wrong for basal LogMC, as it is 0.94 rather than 0.54, which could lead to a misinterpretation of the model.

      Response: We apologise for the mistake in this figure, and we included the corrected heatmap in the revised manuscript.

      Many genome browser figures, including Figure S3. TFBS is not at the same site as eRNAs detected. Is there CAGE data to show that binding these TFs at these sites leads to the expression of eRNAs? That will give direct evidence that the eRNAs are transcribed due to these TFs

      Response: This is a great suggestion from the reviewer. We incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to validate the activity of these ER binding sites in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions (250-1000 bps away from the centre of ER binding site). Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess active bidirectional transcribing ER binding sites far away from subtype-specific eRNA loci and also ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions.

      Page 10, There were 30 Her2-specific eRNA regions.... Do the same enhancers also regulate these genes as those from which eRNAs are transcribed? Is it cis-effect, or could these affect the trans-regulating of other genes?

      Response: We acknowledge the concern from the reviewer, however this is hard to be validated, as functional experiments to explore the 3D interactions of enhancers and gene promoters are not robust enough to be performed in patient samples and can't be performed within the revision time frame. In the final revised manuscript, we will explore the association of enhancers and promoters of ERBB2 with PEGS association as discussed above and with available HiC datasets in Her2+ cell lines (HCC1954, GSE167150, Kim et al., 2022 https://pubmed.ncbi.nlm.nih.gov/35513575/ )

      Minor comments:

      Page 8 "InfoGain meausure..." Fig. S2A also shows high and low expressed eRNAs for the basal group

      Response: We apologise for the lack of clarity here. InfoGain measure identifies both high and low expressed eRNAs in all patients showing similar pattern of regulation among patients. However, logmc derived eRNAs are highly expressed in most patients. Low expressed eRNAs could not be identified in logmc measure as strong as InfoGain regions. The text in the results had been edited in the revised manuscript to reflect better clarity on this point.

      Page 11, Our analyses also identified the role of another..... The statement is misleading as it is the enrichment of these TFs with the eRNAs<br /> Response: We included the word “enrichment” to clarify this statement.

      Page 13, "Around 90% of eRNAs are bidirectional and non-polyadenylated [53]. TCGA expression datasets are based on RNA-seq assays, which capture only non-polyadenylated RNAs. Thus, analysing the expression of eRNAs on mRNA-seq datasets might not be adequate". It is very confusing, please check<br /> Response: We apologise for the mistake, and this has been corrected in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).

    1. In both differentiated instruction and arts integration, the classroom’s physical environment is flexible. In arts integration, furniture is moved to allow for movement, theatrical or dance improvisations, or for various groupings. Students carry out routines for efficiently and quietly setting and re-setting furniture. Teachers organize materials and establish efficient routines for distribution and clean-up. The classroom reflects a student-centered focus with interesting displays documenting students’ creative process and the products they have created.

      I agree with this because, it is important for students to have a place that allows them to focus on their creative sides. As an example, if I may talk about. I was a theatre student through high school and one specific class came to mind. In Drama 4, we did this process called Libby Appel. Within this process was given a clean slate for us to work from. Our whole classroom space was cleared out and as students we had control. Not only was it great for self-expression, but it also taught us to show our creativity with a clean slate. We as student were given instructions but as control of our successes in the process. The reason I brought this up is because, it is important as educators to teach students to use their own creativity. I think that is when students learn the most.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Comments on introduction:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003_)._

      Comments on materials and methods:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      Comments on results:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      Comments on discussion:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers and Revision Plan

      We thank all three reviewers for their thoughtful and constructive comments. We are pleased that the reviewers found our work to be "very interesting," "well written," with "high quality" data that is "convincing" and will be "of broad interest for the community of axon guidance, circuit formation and brain development." We particularly appreciate the recognition that our study provides "novel functions for Cas family genes in forebrain axon organization" and uses "state-of-the art mouse genetics" with "quantitative and statistical rigor." Below are our detailed responses to each reviewer's comments, including extensive additional experiments and analyses that we will perform to significantly strengthen the manuscript.

      Reviewer #1

      We thank this reviewer for recognizing that our experiments are "carefully done and quantified" with "clear and striking" phenotypes that "support most of the conclusions in the manuscript." We appreciate their acknowledgment that this work will be "of interest to developmental neurobiologists and the axon guidance and adhesion fields."

      Major Comments:

      __ Authors clearly show that misplaced TCA axons are coordinate with cortical layer defects, with misplaced tbr1+ neurons, in EMX-Cre cas and integrin knockouts, suggesting these axons are following misplaced cells. These results are described as 100% coordinate, but since there is no figure of quantification, authors need to clarify how many embryos were examined for each genotype, as this was not described in results or legends.__ We apologize for this oversight and will provide detailed quantification of this important finding. We examined a total of 11 Emx1Cre;TcKO embryos with 13 controls, and 14 Emx1Cre;Itgb1 embryos with 13 littermate controls at two developmental stages (E16.5 and P0) to quantify the coordination between misplaced Tbr1+ neurons and cortical bundle formation. This quantification will be presented in the main text and figure legend.

      Here's a more detailed breakdown of those numbers: For Emx1Cre;TcKO knockouts, we examined 7 controls and 5 mutants at P0, and 6 controls and 6 mutant embryos at E16.5. For the Emx1Cre;Itgb1 knockouts, we examined 5 controls and 6 mutant neonates at P0, and 8 controls and 8 mutant embryos at E16.5.

      __ Are the neurons not misplaced in Nex cre cas or integrin knockouts? One would think presumably not, but then what are the tbr1+ cell migration defect caused by? I struggle with the semantics of non-neuronal autonomous role of cas in cortex, since tbr1+ neurons are misplaced, and this is what the axons are mistargeting too. So yes, potentially cas or b1 is not needed in those neurons, but those misplaced neurons are presumably driving the phenotype.__

      We agree that this important point requires better explanation. You are absolutely correct that Tbr1+ neurons are not misplaced in NexCre;TcKO mutants (Wong et al., 2023), which is precisely why these animals do not exhibit cortical bundle formation. In addition to our previously published data showing normal location of Tbr1+ neurons in those mutants, we can also provide similar analysis at E16.5 and P0 as a supplemental figure. The model we propose is that Cas genes are required in radial glial cells for proper positioning of deep layer cortical neurons. These correctly positioned neurons, in turn, provide appropriate guidance cues for TCA projections. Hence, our model is that while the role of Cas genes is non-neuronal-autonomous (acting in radial glia rather than in the neurons themselves), the mispositioned Tbr1+ neurons in Emx1Cre;TcKO mutants drive the TCA misprojection phenotype. We will clarify this mechanism in the discussion and provide a new graphical model as a supplemental figure to facilitate conceptualization of our conclusions.

      __ Authors need to clarify in the discussion that they can't rule out the cas not also needed in tca neurons, Since neither emx or nex cre would hit those cells.__

      We will add the following clarification to the discussion: The analysis of cortical bundle formation in Emx1Cre;TcKOrevealed a comparable phenotype to that observed in NestinCre;TcKO, strongly suggesting a cortical-autonomous role for Cas genes in CB formation. "However, we cannot formally exclude a thalamus-autonomous role for Itgb1 or Cas genes in TCA pathfinding, as we did not ablate these genes exclusively in the thalamus. Future studies using thalamus-specific Cre drivers would be needed to definitively address this question."

      __ Could authors add boxes in zoomed out brain images to denote zoom regions. And potentially a schematic demonstrating placement of DiI for lipophilic tracing experiments.__

      We will add boxes to denote zoom regions where possible throughout the manuscript. For some high magnification panels, we selected the best representative images, which don't necessarily correspond to specific regions of the lower magnification panels, but we will note this in the figure legends. We will also add a schematic diagram to a supplemental figure illustrating DiI placement for all lipophilic tracing experiments.

      Reviewer #2

      We thank this reviewer for describing our study as "very interesting," "well written," with data that are "of high quality" and findings that are "convincing." We appreciate their recognition that we used "state-of-the art mouse genetics" and that our work will be "of broad interest for the community of axon guidance, circuit formation and brain development."

      Major Comments:

      __ Immunofluorescence labeling for other β-integrin family members to examine expression in AC axons may provide insights into why β1-integrin deficiency does not replicate the Cas TcKO phenotype.__ This is an excellent suggestion that we will address experimentally. We will perform RNAscope analysis for integrin β5, β6, and β8 in developing piriform and S1 cortex at E14.5, E16.5, and E18.5, as these are the only other β-integrins expressed during cortical development. We anticipate that this analysis may reveal expression of alternative β-integrins in the neurons that extend axons along the developing anterior commissure, which would provide a potential explanation for why β1-integrin deficiency does not replicate the AC phenotype observed in Cas TcKO animals. These new data will be presented as part of a new figure.

      __ Is there any evidence that β1-integrin in developing cortical axons is colocalized with Cas proteins (in vivo or in vitro)?__

      We have tested multiple antibodies for p130Cas and CasL without success in cortical tissue. However, we will test two new integrin β1 antibodies and a new p130Cas antibody. While direct colocalization may be challenging due to species restrictions and tissue-specific antibody performance, we will attempt to show regional co-expression in consecutive sections. If the integrin antibodies work, we will present data as a supplemental figure demonstrating that p130Cas (using our BAC-EGFP reporter) and β1-integrin show overlapping expression patterns in developing cortical white matter tracts and neurons, supporting their potential functional interaction. In the end, while we will try to address this critique, we will be limited by the reagents that are available to us.

      Minor Comments:

      __ How long do the Cas TcKO with the various cre driver survive?__

      We have not systematically quantified survival beyond 6 months, but surprisingly, survival up to 6 months of age appears normal for all genotypes examined. This information will be included in the Methods section.

      Reviewer #3

      We thank this reviewer for acknowledging that our "main claims and conclusions are solidly supported by the data" with "good overall data quality" and "high quantitative and statistical rigor." We appreciate their recognition that we "uncover novel functions for Cas family genes in forebrain axon organization" and that our "overall reporting and discussion of findings is data-driven and refrains from excessive speculation."

      Addressing Concerns About Novelty and Impact:

      We respectfully disagree with the characterization of our findings as "somewhat incremental." While we acknowledge that similar axonal defects have been described in other lamination mutants, our study makes several novel and significant contributions:

      First demonstration of Cas family requirement in forebrain axon tract development: This is the first study to establish roles for Cas proteins in axon guidance, representing a completely new function for these well-studied signaling molecules. Novel β1-integrin-independent role for Cas proteins: Our finding that AC defects occur in Cas mutants but not β1-integrin mutants reveals a previously unknown signaling pathway and challenges the assumption that Cas proteins always function downstream of β1-integrin. Mechanistic insights into cortical-TCA interactions: While the general principle that cortical lamination affects TCA projections has been established, our work provides the first demonstration of how specific adhesion signaling molecules (Cas proteins) control this process through radial glial function. Cell-type specific requirements: Our systematic analysis using multiple Cre drivers provides unprecedented detail about where and when Cas proteins function during brain development, revealing both neuronal-autonomous (AC) and non-neuronal autonomous (TCA) roles.

      As Reviewer #2 noted, "The main advancement is a more nuanced understanding of where and when these molecules function during brain development and insights into the origin of the defects observed." This represents significant mechanistic progress in understanding forebrain circuit assembly.

      Specific Comments:

      Suggestion about cell autonomy testing: We appreciate the optional suggestion to test strict cell autonomy using sparse deletion approaches. While this would indeed be interesting, it would represent a substantial undertaking beyond the scope of the current study. However, we believe our current data using NexCre (which hits early postmitotic neurons) versus NestinCre (CNS-wide deletion) and Emx1Cre (cortical progenitors) provides supportive evidence for neuronal autonomy of the AC phenotype, as mentioned by the reviewer.

      In vitro axon guidance assays: This is an excellent suggestion for future molecular studies. In the discussion we identify specific candidate guidance molecules (e.g. Ephrins) that would be prime targets for such experiments.

      Cross-Reviewer Comments:

      We appreciate Reviewer #3's agreement with the other reviewers' suggestions and will address the quantification of neuronal mispositioning/axon bundle correlation as requested by Reviewer #1.

      Additional Improvements:

      Beyond addressing the specific reviewer comments, we will make several additional improvements to strengthen the manuscript:

      Enhanced statistical analysis: All quantifications will include appropriate statistical tests with clearly stated n values and multiple litters represented. Expanded discussion: We will better contextualize our findings within the broader axon guidance literature and discuss future directions (e.g. TCAs). New data: Additional controls, expression analysis, and quantifications will strengthen our conclusions.

      We believe these revisions, particularly the new experimental data addressing integrin family expression and the detailed quantification of phenotype coordination, will significantly strengthen our conclusions and demonstrate the novelty and impact of our findings. We hope the reviewers will find these improvements satisfactory and agree that our work makes important contributions to understanding axon guidance mechanisms in the developing forebrain.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism. The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.

      The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

      Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

      Recommendations for the authors:

      We thank the reviewers for their comments which have improved and strengthened our manuscript. Please see our responses to specific comments below in blue.

      Reviewer #1 (Recommendations for the authors):

      I have several small comments:

      (1) There has been recently a preprint from the Yao lab about the role of NR2F2 is steroidogenic cells (https://www.biorxiv.org/content/10.1101/2024.09.16.613312v1). They performed cKO of NR2F2 using the Wt1creERT2 and found similar results. You should present and discuss this paper in light of your results.

      Estermann et al., report a very similar phenotype of FLC hypoplasia in an independent mouse model of Nr2f2 conditional mutation. We have now referred to this article in the discussion of our manuscript as suggested.

      (2) In the introduction I think it is important to mention that the steroidogenic progenitors are derived from Wnt5a positive cells (https://pubmed.ncbi.nlm.nih.gov/35705036/).

      We have mentioned this point in the introduction as suggested.

      (3) In both models you show a decrease in the number of FLC (60% or 40%) and yet they both present with undescended testis. It is important to discuss the fact that there is no need for a complete ablation of testosterone and INSL3 in order to get cryptorchidism.

      We have mentioned this point in the discussion as suggested.

      The fact that you get only partial reduction in FLC is likely due to redundancy with additional factors, possibly the ARX like you stated in the discussion and it will be interesting to explore that in the future but is beyond the scope of the current paper.

      We agree with the reviewer, this question could be addressed by analyzing Arx,Nr2f2 double mutants.

      (4) In page 8 line 11 you mention data not shown- not sure if this is allowed in the journal .

      The data is now shown in Figure S5A as suggested.

      (5) In Figure 2- it will be good if you add a schematic model of the mouse strains used as well as the experimental and control mice next to the Tam scheme. Similar scheme should be in figure 3 for Nr5a1-cre.

      We have modified Figures 2 and 3 as suggested.

      (6) There is a clear and pronounced effect of the testis cords number and size. It will be good if you could qualify testis cord numbers/ diameter in the mutants even if you do not follow in detail the effect on Sertoli cells

      We have quantified testis cords numbers and area in E14.5 Control and Wt1<sup>CreERT2/+</sup>; Nr2f2<sup>flox/flox</sup> testes. This data is now shown in Figure S2M.

      (7) It will be good to present the undescended testis in the Wt1-cre model in figure 2 and not in the supp figure

      The data is now shown in Figure 2H-I as suggested.

      (8) Please add labelling of the testis, kidney, bladder, vas deferens in figure 3 N+O and in the Wt1-cre model

      We have added the labels in Figures 2 and 3 as suggested.

      (9) In figure 5 which present both models- it will be good to use the scheme I suggested before to highlight which results refer to which ko model.

      We have modified Figure 5 as suggested.

      Reviewer #2 (Recommendations for the authors):  

      The work presented in this manuscript gave me food for thought. I have always been intrigued by the fact that of the large number of interstitial cells in the testis, a minority differentiate into mature androgen-producing Leydig cells. In other words, how is the number of functional steroidogenic cells defined from a large pool of progenitor cells (ARX and NR2F2 positive ones)? This may have a link with the levels of androgens produced (a kind of feedback control) or the effectiveness of these androgens on the target tissues (i.e.: as spermatogenesis efficiency in adults). In addition, there must be specific signals (probably linked to gonadotropins) that induce the recruitment of Leydig cells from the progenitor pool. Perhaps the genetic models generated in this study could help to address these questions. I leave it to the authors to judge.

      We agree with the reviewer. How NR2F2 (and other factors) integrate extrinsic cues to regulate the recruitment of a subset of interstitial steroidogenic progenitors along the Leydig cell differentiation pathway is a fascinating question beyond the scope of this work.

      In addition to this reflection, I propose a few minor modifications likely to improve the quality of the manuscript:

      (1) Page 3, lane 3: I suggest to replace "growth" by "differentiation"

      We have modified the text as suggested.

      (2) Page 3, lane 4: the "scrotum" is missing in the parenthesis. Please add it before "and penis"

      We have modified the text as suggested.

      (3) Page 5, lanes 21-24: kidney hypoplasia is also evident on Fig S2H (stated in the figure legend). It could be also mentioned in this sentence and it implies "...that NR2F2 function is required for testicular and kidney development."

      We have modified the text as suggested.

      (4) Page 5, lanes 28-30. In addition to the reduction in the number of HSD3B-positive cells, HSD3B staining seems clearly more faint in mutant FLC (Fig 2M) compared to adrenal cells on the same section or FLC in control gonads. This fits well with other results on the level of steroidogenic enzymes (Fig 2O) and those presented thereafter (Fig S4 I-J and Fig 5). Perhaps the author could mention this fact.

      We have modified the text as suggested in the results section “NR2F2 is required for FLC maturation” (Page 8).

      (5) Page 5, lanes 31-34: testicular descent is hugely sensible to INSL3 in the mouse (by contrast with other species where androgens seem to be more critical). I was wondering if you can check a better phenotypic marker for the absence (or reduction) of androgens like the differentiation of epididymides by HE staining or the anogenital distance at birth.

      We have measured the anogenital distance at P0 and P1 as suggested and have included the corresponding graph in Fig. S3P

      (6) Page 8, lanes 21-22: "HSD3B positive FLC were smaller and more elongated". It is clear on Fig 5F but not evident on Fig 5D. Could the authors propose another image?

      We have modified Figure 5 as suggested and provide now another example of HSD3B positive FLCs in a Nr5a1Cre; Nr2f2<sup>flox/flox</sup> mutant gonad (Fig. 5D) and the corresponding control littermate (Fig. 5C).

      (7) Page 14, lane 12: "(arrow in I)" should be "(arrow in H)"

      We have modified the text as suggested. Please note that ACTA 2 expression is now shown in Figure S2 G-H.

      (8) Page 15, lane 6: "Arrows indicate NR5A1 positive FLC". There is no arrow on Fig4 C,D; but a kind of scale bar on the enlargement shown in C.

      We have modified Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302):

      “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].”

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Line 303-313):

      “In our simulation experiments, we assume the coexistence of the Pavlovian fear system and the instrumental system to demonstrate the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone, with higher punishment sensitivity, therefore we do not argue for the necessity for the Pavlovian fear system here. Instead, the Pavlovian fear system itself could be a potential biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies; the neural substrates for the Pavlovian fear system are well known (e.g., the limbic loop and amygdala, further see Supplementary Fig. 16). Additionally, Pavlovian fear system provides a separate punishment memory that cannot be erased by greater rewards like [Elfwing and Seymour, 2017, Wang et al., 2018]. This fundamental point can be observed in our simple T-maze simulations, where the Pavlovian fear system encourages avoidance behaviour and the agent chooses the smaller reward instead of the greater reward.”

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302 onwards) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      Thanks to the reviewer’s comments, we have now mentioned this point in Lines 299-302.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We hope our additions to the Discussion section, from Line 290 to Line 313 address the reviewer’s concerns.  

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We have now added a line discussing this. (Line 356-358)

      “Future work could also use a formal account of uncertainty which could fit the fear-conditioned skin-conductance response better than Pearce-Hall associability [Tzovara et al., 2018].”

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      Thank you, we have added further explanations in the discussion section. We have further improved the writing in abstract, introduction and Methods section taking into account recommendations from reviewer #2 and #3.

      Reviewer #2 (Recommendations for the authors): 

      (1) Why is there no flexible omega in Figures 3B and 3C? Did I miss this? 

      Thank you. We have now added additional text to explain our motivation in Experiment 2, which only varies the fixed omega and omits the flexible omega (Lines 136-140).

      “In this set of results, we wish to qualitatively tease apart the role of a Pavlovian bias in shaping and sculpting the instrumental value and also provide more insight into the resulting safety-efficiency trade-off. Having shown the benefits of a flexible ω in the previous section, here we only vary the fixed ω to illustrate the effect of a constant bias and are not concerned with the flexible bias in this experiment.”

      We encourage the reader to consider this akin to an additional study that will explain how Pavlovian bias to withdraw can play a role in avoiding punishments similar to that of punishment sensitivity. This is particularly important as we do have neural correlates for Pavlovian biases but lack a clear neural correlation for punishment sensitivity so far, as mentioned in our new additions to the Discussion section (Lines 303-313).

      (2) The introduction of the flexible omega and the PAL agent in the results is a bit sudden. Some more details are needed to understand this during the first read of this passage. 

      We thank reviewer #2 for bringing this to our notice. We have attempted to refine our passage by including sentences like - 

      “The standard (rational) reinforcement learning system is modelled as the instrumental learning system. The additional Pavlovian fear system biases the withdrawal actions to aid in safe exploration, in line with our hypothesis.”

      “Both systems learn using a basic temporal difference updating rule (or in instances, its special case, the Rescorla-Wagner rule)”

      “We implement the flexible ω using Pearce-Hall associability (see equation 15 in Methods). The Pearce-Hall associability maintains a running average of absolute temporal difference errors (δ) as per equation 14. This acts as a crude but easy-to-compute metric for outcome uncertainty which gates the influence of the Pavlovian fear system, in line with our hypothesis. This implies that higher the outcome uncertainty, as is the case in early exploration, the more cautious our agent will be, resulting in safer exploration”

      (3) In my view, the possibility of modeling moving predators is extremely interesting. I would include Figure 8D and the corresponding explanation in the main text. 

      Response with revision: We thank the reviewer for finding our simulation on moving predators extremely interesting. Unfortunately, since our instrumental system is not model-based, and especially is not explicitly modelling the predator dynamics, our simulation might not be a very accurate representation of real moving predator environments. As pointed out by Reviewer #1, perhaps several other systems other than Pavlovian fear responses are necessary for safe behaviour in such environments and we hope to address these in future studies. Thanks again for taking an interest in our simulations.

      (4) The VR experiment should be mentioned more clearly in the abstract and the introduction. It should be mentioned a bit more clearly why VR was helpful and why the authors did not use a simple bird's eye grid world task. 

      I cannot assess the RLDDM and I did not check the code. 

      Thank you, we have now mentioned the VR experiment more clearly in the abstract and the introduction. We also now further mention that the VR experiment “builds upon previous Go-No Go studies studying Pavlovian-Instrumental transfer (Guitart-Masip et al, 2012; Cavanagh et al, 2013). The virtual-reality approach confers a greater ecological validity and the immersive nature may contribute better fear conditioning, making it easier to distinguish the aversive components.”

      A bird’s eye grid world may not invoke a strong withdrawal response, as seen in these immersive approach-withdrawal tasks where we can clearly distinguish a Pavlovian fear-based withdrawal response. We did include immersive VR maze results in the supplementary materials, but future work is needed to isolate the different systems at play in such a complex behaviour.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      Thank you, we have now attempted to clarify these points in the Discussion section by adding the following text (Lines 313-321):

      “ We next discuss the plausibility of pre-training to select the hardwired actions In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesised to function as a Pavlovian fear/threat learning system [Menegas et al., 2018].”

      Reviewer #3 (Recommendations for the authors): 

      I have relatively little to suggest, as in my view the paper is robust, thorough, and creative, and does enough to support the primary argument being made at the most fundamental level. My suggestions for improvement are as follows: 

      (1) Some aspects of the model are potentially unrealistic (as described in the public review), and the paper may benefit from some discussion of these issues or attempts to make the model more realistic - i.e., to what extent is this plausible in explaining more complex avoidance behaviour? Primarily, the fact that pre-training is required to identify actions subject to Pavlovian bias seems unlikely to be effective in real-world situations - is there a better way to achieve this in cases where there isn't necessarily an instinctual Pavlovian response? 

      Thank you, we agree that the advantage of Pavlovian bias is restricted to the bias/instinctual Pavlovian response conferred by evolution. Future work is needed to model more complex avoidance behaviour such as escapes. We hope to have made this more clear with our edits to the Discussion (Lines 299-302) in our response to Reviewer #1’s comments, specifically:

      “The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020]”  

      (2) The description of the model in the method can be a little hard to follow and would benefit from further explanation of certain parameters. In general, it would be good to ensure that all terms mentioned in equations are described clearly in the text (for example, in Equation1 it isn't clear what k refers to). 

      Thank you, we have now added further information on all of the parameters in Equation 1 and overall improved the Methods section writing, for instance using time subscript for less confusion while introducing the parameters. We use the standard notation used in Sutton and Barto textbook. k refers to the timesteps into the future, and is now explained better in the Methods section.

      (3) Another point of clarification in Equation 1 - does the policy account for the Pavlovian influence or is this purely instrumental? 

      Thank you, Equation 1 is purely instrumental. We have now specifically mentioned this. The Pavlovian influence follows later. They are combined into propensities for action as per equations 11-13.

      (4) I was curious whether similar outcomes could be achieved by more complex instrumental models without the need for Pavlovian influences. For example, could different risk-sensitive decision rules (e.g., conditional value at risk) that rely only on the instrumental system afford safe behaviour without the need for an additional Pavlovian system? 

      Thank you for your comment. Yes, CVaR can achieve safe exploration/cautious behaviour in choices similar to Pavlovian avoidance learning. But we think both differ in the following ways:

      (1) CVaR provides the correct solution to the wrong problem (objective that only maximises the lower tail of the distribution of outcomes)

      (2) Pavlovian bias provides the wrong solution to the right problem (normative objective, but a Pavlovian bias which may be vestige of evolution)

      Here we use the “wrong problem, wrong solution, wrong environment” categorisation terminology from Huys et al. 2015.

      Huys, Q. J., Guitart-Masip, M., Dolan, R. J., & Dayan, P. (2015). Decision-theoretic psychiatry. Clinical Psychological Science, 3(3), 400-421.

      Secondly, we find an effect of Pavlovian bias on reaction times - slowing down of approach responses and faster withdrawal responses. We do not think this can be best explained in a CVaR type model and is a direction for future work. We think such model-based methods are slower to compute, but Pavlovian withdrawal bias is quicker response.

      We have now included this in brief in Lines 280-288.

      (5) Figure 5 would benefit from a clearer caption as it is not necessarily clear from the current one that the left panels refer to choices and the right panels to reaction times. 

      Thank you, we have improved the caption for Fig. 5.

      (6) It would be good to include some indication of the quality of the model fits for the human behavioural study (i.e., diagnostics such as R-hat) to ensure that differences in model fit between models are not due to convergence issues with different models. This would be especially helpful for the RLDDM models as these can be difficult to fit successfully.

      Thank you, we observed that all Rhat values were strictly less than 1.05 (most parameters were less than 1.01 and generally close to 1), indicating that the models converged. We have now added this line to the results (Line 246-248). Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302): “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].” In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      As suggested reviewer 1, we tested whether overexpression of Mad could rescue or mitigate the loss of sakura phenotypic defects, by using nos-Gal4-VP16 > UASp-Mad-GFP in the background of sakura<sup>null</sup>. As shown in Fig S11, we did not observe any mitigation of defects.

      Then, we also tested whether expressing a constitutive active form of Tkv, by using UAS-Dcr2, NGT-Gal4 > UASp-tkv.Q235D in the background of sakura<sup>RNAi</sup>. As shown in Fig S12, we did not observe any mitigation of defects by this approach either.

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      We examined Bam and DE-Cadherin expression in later RNAi knockdowns driven by ToskGal4. As shown in Fig S9, Bam was not expressed in these later knockdowns compared with controls. DE-Cadherin staining suggested a disorganized structure in late-stage egg chambers.

      We agree that we overstated a role of Sakura in regulating Orb in the initial manuscript. We changed the text to avoid overstating.

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

      We changed the text to avoid overstating a role of Sakura in regulating Orb localization.

      Based on our newly added results showing that transgenic overexpression of Mad could not rescue or mitigate the phenotypic defects of sakura<sup>null</sup> mutant (Fig S11), we do not think the reason for less pMad is less translation of Mad.

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General Comments:

      (1) The gene nomenclature: As mentioned in the text, Sakura means cherry blossom and is one of the national flowers of Japan. I am not sure whether the phenotype of the CG14545 mutant is related to Sakura or not. I would like to suggest the authors reconsider the naming.

      The striking phenotype of sakura mutant­ is tumorous and germless ovarioles. The tumorous phenotype, exhibiting lots of round fusome in germarium visualized by anti-Hts staining, looks like cherry blossom blooming to us. Also, the germless phenotype reminds us falling of the cherry blossom, especially considering that the ratio of tumorous phenotype decreases and that of germless decreases over fly age. Furthermore, “Sakura” symbolizes birth and renewal in Japanese culture (the last author of this manuscript is Japanese). Our findings indicated that the gene sakura is involved in regulation of renewal and differentiation of GSCs (which leads to birth). These are the reasons for the naming, which we would like to keep.

      (2) In many of the microscopic photographs in the figures, especially for the merged confocal images, the resolution looks low, and the images appear blurred, making it difficult to judge the authors' claims. Also, the Alpha Fold structure in Figure 10A requires higher contrast images. The magnification of the images is often inadequate (e.g. Figures 3A, 3B, 5E, 7A, etc). The authors should take high-magnification images separately for the germarium and several different stages of the egg chambers and lay out the figures.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      Specific Comments

      (1) How Sakura can cooperate with Otu remains unanswered. Sakura does not regulate deubiquitinase activity in vitro. Both sakura and otu appear to be involved in the Dpp-Smad signaling pathway and in the spatial control of Bam expression in the germarium, whereas Otu has been reported to act in concert with Bam to deubiquitinate and stabilize Cyc A for proper cystoblast differentiation. Therefore, it is plausible that the stabilization of Cyc A in the Sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. The authors may need to provide much deeper insight into the mechanism by which Sakura plays roles in these seemingly separable steps to orchestrate germ cell maintenance and differentiation during early oogenesis.

      Yes, it is possible that the stabilization of CycA in the sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. To test the significance and role of the Sakura-Otu interaction, we have attempted to identify Sakura point mutants that lose interaction with Otu. If such point mutants were successfully obtained, we were planning to test if their transgene expression could rescue the phenotypes of sakura mutant as the wild-type transgene did. However, after designing and testing the interaction of over 30 point mutants with Otu, we could not obtain such mutant version of Sakura yet. We will continue making efforts, but it is beyond the scope of the current study. We hope to address this important point in future studies.

      (2) Figure 3A and Figure 4: The authors show that piRNA production is abolished in Sakura KO ovaries. It is known that piRNA amplification (the ping-pong cycle) occurs in the Vasa-positive perinuclear nuage in nurse cells. Is the nuage normally formed in the absence of Sakura? The authors provide high-magnification images in the germarium expressing Vas-GFP. How does Sakura, and possibly Out, contribute to piRNA production? Are the defects a direct or indirect consequence of the loss of Sakura?

      We provided higher magnification images of germarium expressing Vasa-EGFP in sakura mutant background (Fig 3A and 3B). The nuage formation does not seem to be dysregulated in sakura mutant. Currently, we do not know if the piRNA defects are direct or indirect consequence of the loss of Sakura. This question cannot be answered easily. We hope to address this in future studies.

      (3) Figure 7 and Figure 12: The authors showed that Dpp-Smad signaling was abolished in Sakura KO germline cells. The same defects were also observed in otu mutant ovaries (Figure 12B). How does the Sakura-Otu axis contribute to the Dpp-Smad pathway in the germline?

      As we mentioned in the response to comment (1), we attempted to test the significance and role of the Sakura-Otu interaction, including in the Dpp-Smad pathway in the germline, but we have not yet been able to obtain loss-of-interaction mutant(s) of Sakura. We hope to address this in future studies.

      (4) Figure 9 and Fig 10: The authors raised antibodies against both Sakura and Otu, but their specificities were not provided. For Western blot data, the authors should provide whole gel images as source data files. Also, the authors argue that the Otu band they observed corresponds to the 98-kDa isoform (lines 302-304). The molecular weight on the Western blot alone would be insufficient to support this argument.

      When we submitted the initial manuscript, we also submitted original, uncropped, and unmodified whole Western blot images for all gel images to the eLife journal, as requested. We did the same for this revised submission. I believe eLife makes all those files available for downloading to readers.

      In the newly added Fig S13B, we used very young 2-5 hours ovaries and 3-7 days ovaries. 2-5 days ovaries contain only mostly pre-differentiated germ cells. Older ovaries (3-7 days in our case here) contain all 14 stages of oogenesis and later stages predominate in whole ovary lysates.

      As reported in previous literature (Sass et al. 1995), we detected a higher abundance of the 104 kDa Otu isoform than the 98 kDa isoform in from 2-5 hours ovaries and predominantly the 98 kDa isoform in 3-7 days ovaries (Fig S13B). These results confirmed that the major Otu isoform we detected in Western blot, all of which uses old ovaries except for the 2-5 hours ovaries in Fig S13B, is the 98 kDa isoform.

      (5) Otu has been reported to regulate ovo and Sxl in the female germline. Is Sakura involved in their regulation?

      We examined sxl alternative splicing pattern in sakura mutant ovaries. As shown in Fig S6, we detected the male-specific isoform of sxl RNA and a reduced level of the female-specific sxl isoform in sakura mutant ovaries. Thus Sakura seems to be involved in sxl splicing in the female germline, while further studies will be needed to understand whether Sakura has a direct or indirect role here.

      (6) Lines 443-447: The GSC loss phenotype in piwi mutant ovaries is thought to occur in a somatic cell-autonomous manner: both piwi-mutant germline clones and germline-specific piwi knockdown do not show the GSC-loss phenotype. In contrast, the authors provide compelling evidence that Sakura functions in the germline. Therefore, the Piwi-mediated GSC maintenance pathway is likely to be independent of the Sakura-Otu axis.

      We changed the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a cleanly written manuscript, with some sentences/sections that are confusing the way they are constructed (i.e. Line 37-38, 334, section on Flp/FRT experiments).

      We rewrote those sections to avoid confusion.

      Comment for all merged image data: the quality of the merged images is very poor - the individual channels are better but should also be reprocessed for more resolved image data sets. Also, it would be helpful to have boundaries drawn in an individual panel to identify the regions of the germarium, as cartooned in Figure S1A (which should be brought into Figure 1) F-actin or Vsg staining would have helped throughout the manuscript to enhance the visualization of described phenotypes.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      We outlined the germarium in Fig 1E.

      We brought the former FigS1 into Fig 1A.

      We provided Phalloidin (F-Actin) staining images in Fig S7.

      All p-values seem off. I recommend running the data through the student t-test again.

      We used the student t-test to calculate p-values and confirmed that they are correct. We don’t understand why the reviewer thinks all p-values seem off.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      Figure 1

      (1) Within the text, C is mentioned before A.

      We updated the text and now we mentioned Fig 1A before Fig 1C.

      (2) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      (3) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      (4) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakura<sup>null</sup> phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      (5) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      (6) Line 95 "as shown below" is not clear that it's referencing panel D.

      We now referenced Fig 1D.

      (7) Re: Figures 1 E and F. There is no mention of Hts or Vasa proteins in the text.<br /> "Sakura-EGFP was not expressed in somatic cells such as terminal filament, cap cells, escort cells, or follicle cells (Figure 1E). In the egg chamber, Sakura-EGFP was detected in the cytoplasm of nurse cells and was enriched in developing oocytes (Figure 1F)". Outline these areas or label these structures/sites in the images. The color of Merge labels is confusing as the blue is not easily seen.

      We mentioned Hts and Vasa in the text. We labeled the structures/sites in the images and updated the color labeling.

      Figure 2

      (1) Entire figure is not essential to be a main figure, but rather supplemental.

      We don’t agree with the reviewer. We think that the female fertility assay data, where sakura null mutant exhibits strikingly strong phenotype, which was completely rescued by our Sakura-EGFP transgene, is very important data and we would like to present them in a main figure.

      (2) 2A- one star (*) significance does not seem correct for the presented values between 0 and 100+.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      (3) 2C images are extremely low quality. Should be presented as bigger panels.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images. We also presented as bigger panels.

      Figure 3

      (1) "We observed that some sakura<sup>null</sup> /null ovarioles were devoid of germ cells ("germless"), while others retained germ cells (Fig 3A)" What is described is, that it is hard to see. Must have a zoomed-in panel.

      We provided zoomed-in panels in Fig 3B

      (2) C - The control doesn't seem to match. Must zoom in.

      We provided matched control and also zoomed in.

      (3) For clarity, separate the tumorous and germless images.

      In the new image, only one tumorous and one germless ovarioles are shown with clear labeling and outline, for clarity.

      (4) Use arrows to help clearly indicate the changes that occur. As they are presented, they are difficult to see.

      We updated all the panels to enhance clarity.

      (5) Line 158 seems like a strong statement since it could be indirect.

      We softened the statement.

      Figure 4

      (1) Line 188-189 - Conclusion is an overstatement.

      We softened the statement.

      (2) Is the piRNA reduction due to a change in transcription? Or a direct effect by Sakura?

      We do not know the answers to these questions. We hope to address these in future studies.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer’s point. We think using numbers, not %, makes more sense.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      (3) In Line 218 there's an extra parenthesis after the PGC acronym.

      We corrected the error.

      (4) TOsk-Gal4 fly is not in the Methods section.

      We mentioned TOsk-Gal4 in the Methods.

      Figure 6:

      (1) The FLP-FRT section must be rewritten.

      We rewrote the FLP-FRT section.

      (2) A - include statistics.

      We included statistics using the chi-square test.

      (3) B - is not recalled in the Results text.

      We referred Fig 6B in the text.

      (4) Line 232 references Figure 3, but not a specific panel.

      We referred Fig 3A, 3C, 3D, and 3E, in the text.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      (1) There should be CycA expression in the control during the first 4 divisions.

      Yes, there is CycA expression observed in the control during the first 4 divisions, while it’s much weaker than in sakura<sup>null</sup> clone.

      (2) Helpful to add the dotted lines to delineate (A) as well.

      We added a dotted outline for germarium in Fig 7A.

      (3) Line 263 CycA is miswritten as CyA.

      We corrected the typo.

      Figure 9

      (1) Otu antibody control?

      We validated Otu antibody in newly added Fig 10C and Fig S13A.

      (2) Which Sakura-EGFP line was used? sakura het. or null background? This isn't mentioned in the text, nor legend.

      We used Sakura-EGFP in the background of sakura[+/+]. We added this information in the methods and figure legend.

      (3) C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti-Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      Figure 10

      (1) A- The resolution of images of the ribbon protein structure is poor.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      (2) A table summarizing the interactions between domains would help bring clarity to the data presented.

      We added a table summarizing the fragment interaction results.

      (3) Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer’s points. In our study, even for the full-length proteins.

      We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      Figure 12

      (1) A - control and RNAi lines do not match.

      We provided matched images.

      (2) In general, since for Sakura, only its binding to Otu was identified and since they phenocopy each other, doesn't most of the characterization of Sakura just look at Otu phenotypes? Does Sakura knockdown affect Otu localization or expression level (and vice versa)?

      We tested this by Western (Fig S15) and IF (Fig 12). Sakura knockdown did not decrease Otu protein level, and Otu knockdown did not decrease Sakura protein level (Fig S15). In sakura<sup>null</sup> clone, Otu level was not notably affected (Fig 12). In sakura<sup>null</sup> clone, Otu lost its localization to the posterior position within egg chambers.

      Figure S6

      (1) It is Luciferase, not Lucifarase.

      We corrected the typo.

      Reviewer #3 (Recommendations for the authors):

      (1) It is interesting that germless and tumorous phenotypes coexist in the same population of flies. Additional consideration of these essentially opposite phenotypes would significantly strengthen the study. For example, do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age? The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype? Is transposon expression involved in either phenotype? Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole? Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes? What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts? It may not be necessary to answer all of these questions, but more insight into how these two phenotypes can be caused by loss of sakura would be helpful.

      We performed new experiments to answer these questions.

      do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age?

      Tumorous and germless ovarioles coexist in the same fly (in the same ovary). Tumorous ovarioles are present in very young (0-1 day old) flies, including newly eclosed (Fig S5). The ratio of germless ovarioles increases and that of tumorous ovarioles decreases with age (Fig S5).

      The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype?

      bam knockdown effect on tumorous phenotype is shown in Fig S10. bam knockdown increased the ratio of tumorous ovarioles and the number of GSC-like cells.

      Is transposon expression involved in either phenotype?

      Since our transposon-piRNA reporter uses germline-specific nos promoter, it is expressed only in germ line cells, so we cannot examine in germless ovarioles.

      Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole?

      Yes, Sakura mutant GSC clones overgrow. Please compare Fig 6C and Fig S8.

      Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes?

      Fig S10 and Fig S12 show the ovariole phenotypes of sakura RNAi driven by NGT-Gal4. It causes both germless and tumorous phenotypes.

      What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts?

      Our mosaic clone was induced at the adult stage, so we already have data of adulthood-specific loss of function. Gal80ts does not work well with nos-Gal4.

      (2) The idea that the excessive bam expression in tumorous ovaries is due to a failure of bam repression by dpp signaling is not well-supported by the data. Dpp signaling is activated in a very narrow region immediately adjacent to the niche but the images in Figure 7A show bam expression in cells that are very far away from the niche. Thus, it seems more likely to be due to a failure to turn bam expression off at the 16-cell stage than to a failure to keep it off in the niche region. To determine whether bam repression in the niche region is impaired, it would be important to examine cells adjacent to the niche directly at a higher magnification than is shown in Figure 7A.

      We provided higher magnification images of cells adjacent to the niche in new Fig 7A.

      We found that cells adjacent to the niche also express Bam-GFP.

      That said, we agree with the reviewer. A failure to turn bam expression off at the 16-cell stage may be an additional or even a main cause of bam misexpression in sakura mutant. We added this in the Discussion.

      (3) In addition, several minor comments should be addressed:

      a. Does anti-Sakura work for immunofluorescence?

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies.

      b. Please provide insets to show the phenotypes indicated by the different color stars in Figure 3C more clearly.

      We provided new, higher-magnification images to show the phenotypes more clearly.

      c. Please indicate the frequency of the expression patterns shown in Figure 4D (do all ovarioles in each genotype show those patterns or is there variable penetrance?).

      We indicated the frequency.

      d. An image showing TOskGal4 driving a fluorophore should be provided so that readers can see which cells express Gal4 with this driver combination.

      It has been already done in the paper ElMaghraby et al, GENETICS, 2022, 220(1), iyab179, so we did not repeat the same experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mallimadugula et al. combined Molecular Dynamics (MD) simulations, thiol-labeling experiments, and RNA-binding assays to study and compare the RNA-binding behavior of the Interferon Inhibitory Domain (IID) from Viral Protein 35 (VP35) of Zaire ebolavirus, Reston ebolavirus, and Marburg marburgvirus. Although the structures and sequences of these viruses are similar, the authors suggest that differences in RNA binding stem from variations in their intrinsic dynamics, particularly the opening of a cryptic pocket. More precisely, the dynamics of this pocket may influence whether the IID binds to RNA blunt ends or the RNA backbone.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Strengths:

      The use of extensive Adaptive Sampling combined with biochemical assays clearly points to the opening of the Interferon Inhibitory Domain (IID) as a factor for RNA binding. This type of approach is especially useful to assess how protein dynamics can affect its function.

      Weaknesses:

      Although a connection between the cryptic pocket dynamics and RNA binding mode is proposed, the precise molecular mechanism linking pocket opening to RNA binding still remains unclear.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether a cryptic pocket in the VP35 protein of Zaire ebolavirus has a functional role in RNA binding and, by extension, in immune evasion. They sought to address whether this pocket could be an effective therapeutic target resistant to evolutionary evasion by studying its role in dsRNA binding among different filovirus VP35 homologs. Through simulations and experiments, they demonstrated that cryptic pocket dynamics modulate the RNA binding modes, directly influencing how VP35 variants block RIG-I and MDA5-mediated immune responses.

      The authors successfully achieved their aim, showing that the cryptic pocket is not a random structural feature but rather an allosteric regulator of dsRNA binding. Their results not only explain functional differences in VP35 homologs despite their structural similarity but also suggest that targeting this cryptic pocket may offer a viable strategy for drug development with reduced risk of resistance.

      This work represents a significant advance in the field of viral immunoevasion and therapeutic targeting of traditionally "undruggable" protein features. By demonstrating the functional relevance of cryptic pockets, the study challenges long-standing assumptions and provides a compelling basis for exploring new drug discovery strategies targeting these previously overlooked regions.

      Strengths:

      The combination of molecular simulations and experimental approaches is a major strength, enabling the authors to connect structural dynamics with functional outcomes. The use of homologous VP35 proteins from different filoviruses strengthens the study's generality, and the incorporation of point mutations adds mechanistic depth. Furthermore, the ability to reconcile functional differences that could not be explained by crystal structures alone highlights the utility of dynamic studies in uncovering hidden allosteric features.

      Weaknesses:

      While the methodology is robust, certain limitations should be acknowledged. For example, the study would benefit from a more detailed quantitative analysis of how specific mutations impact RNA binding and cryptic pocket dynamics, as this could provide greater mechanistic insight. This study would also benefit from providing a clear rationale for the selection of the amber03 force field and considering the inclusion of volume-based approaches for pocket analysis. Such revisions will strengthen the robustness and impact of the study.

      Reviewer #3 (Public review):

      Summary:

      The authors suggest a mechanism that explains the preference of viral protein 35 (VP35) homologs to bind the backbone of double-stranded RNA versus blunt ends. These preferences have a biological impact in terms of the ability of different viruses to escape the immune response of the host.

      The proposed mechanism involves the existence of a cryptic pocket, where VP35 binds the blunt ends of dsRNA when the cryptic pocket is closed and preferentially binds the RNA double-stranded backbone when the pocket is open.

      The authors performed MD simulation results, thiol labelling experiments, fluorescence polarization assays, as well as point mutations to support their hypothesis.

      Strengths:

      This is a genuinely interesting scientific question, which is approached through multiple complementary experiments as well as extensive MD simulations. Moreover, structural biology studies focused on RNA-protein interactions are particularly rare, highlighting the importance of further research in this area.

      Weaknesses:

      - Sequence similarity between Ebola-Zaire (94% similarity) explains their similar behaviour in simulations and experimental assays. Marburg instead is a more distant homolog (~80% similarity relative to Ebola/Zaire). This difference is sequence and structure can explain the propensities, without the need to involve the existence of a cryptic pocket.  

      - No real evidence for the presence of a cryptic pocket is presented, but rather a distance probability distribution between two residues obtained from extensive MD simulations. It would be interesting to characterise the modelled RNA-protein interface in more detail

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Before assessing the overall quality and significance of this work, this reviewer needs to specify the context of this review. This reviewer's expertise lies in biased and unbiased molecular dynamics simulations and structural biology. Hence, while this reviewer can overall understand the results for thiol-labeling and RNA-binding assays, this review will not assess the quality of these biochemical assays and will mainly focus on the modelling results.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Beyond the clear qualities of this work, I would like to mention a few points that may help to better contextualize and rationalize the results presented here.

      - First, both the introduction and discussion sections seem relatively condensed. Extending them to, for example, better describe the methodological context and discuss the methodological limitations and potential future developments related to biased simulations may help the reader get a better idea of the significance of this work.

      - The authors presented 3 homologs in this study: IIDs of Reston, Zaire, and Marburg viruses. While Zaire and Reston are relatively similar in terms of sequence (Figure S1). The sequences clearly differ between Marburg and the two other viruses. Can the author indicate a similarity/identity score for each sequence alignment and extend Figure S1 to really compare Marburg sequence with Reston and Zaire? Can they also discuss how these differences may impact the comparison of the three IIDs? This may also help the reader to understand why sometimes the authors compare the three viruses and why sometimes they are focusing only on comparing Zaire and Reston.

      We would like to thank the reviewer for raising this point and we agree that additional details about the sequence comparison provide more context for the choices of substitutions we made. Therefore, we have updated Fig S1 to include a detailed pairwise comparison of all the IID sequences including the percentage sequence similarity and identity. We have also added the following sentences to the results section where we first introduced the substitutions between Zaire and Reston IIDs

      “While the sequence of Marburg IID differs significantly from Reston and Zaire IIDs with a sequence identity of 42% and 45% respectively (Fig S1), the sequences of Reston and Zaire IID are 88% identical and 94% similar. Particularly, substitutions between these homologs are all distal to the RNA-binding interfaces and all the residues known to make contacts with dsRNA from structural studies are identical. Therefore, we reasoned that comparing these two homologs would help us identify minimal substitutions that control pocket opening probability and allow us to study its effect on dsRNA binding with minimal perturbation of other factors.”

      - In this work, the authors mentioned the cryptic pocket but only illustrated the opening of this pocket by using a simple distance between residues (Figure 2) and a SASA of one cysteine (Figure 3). In previous work done by the authors (Cruz et al. , Nature Communications, 2022), they better characterized residues involved in RNA binding and forming the cryptic pocket. Thus, would it be possible to better described this cryptic pocket (residues involved, volume, etc ..) and better explain how, structurally speaking, it can affect RNA binding mode (blunt ends vs backbone) ?

      We thank the reviewer for pointing out the need for clarification on the residues involved in RNA binding and pocket opening and the mechanism linking them. We have performed the CARDS analysis on Reston and Marburg IID simulations as we had done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section.

      - As a counter-example, the authors used C315 for SASA calculation and thiol labeling (Figure 3). This cysteine is mainly buried as seen by SASA for Reston and Marburg and thiol labelling (Figure 3 E,G,H). Would it be possible to also get thiol labeling rates for Cystein 264 in Reston and its equivalent to see a case where the residue is solvent exposed?

      We have shown the SASA for C264 from the simulations in Fig S4 and the thiol labeling rates for all 4 cysteines in Reston IID in Fig S6. Comparing these rates to the rates of all 4 cysteines obtained for Zaire IID (Fig 4 in Cruz et Al, 2022), we observe that the rates for C264, which is expected to be exposed are significantly faster than those of C315 which is largely buried in all variants.  

      - I strongly support here the will of the authors to share their data by depositing them in an OSF repository. These data help this reviewer to assess some of the results produced by the authors and help to better understand the dynamics of their respective systems. I have just a few comments that need to be addressed regarding these data: o While there are data for WT Reston and Marburg, there is no data for Zaire. Is this because these data correspond to the previous work (Cruz et al. 2022) (in this case, it would be good to make this clear in the main text) or is it an omission? o There is no center.xtc file in the Marburg-MSM directory o There is no protmasses.pdb in the Reston-MSM directory

      - In general, if possible, it would be good to use the same name for each type of file presented in each directory to help a potential user understand a bit more how to use these data.

      - If possible, adding a bit more of metadata and explanations on the OSF webpage would be very beneficial to help find these data. To help in this direction, the authors may have a look to the guidelines presented at the end of this article: https://elifesciences.org/articles/90061

      We thank the reviewer for pointing out the omissions from the OSF repository. We have added the missing files and followed a uniform naming convention. We have also added documentation in the metadata section of the OSF repository to help others use the data.  

      Indeed, the simulation data used for Zaire IID is available on the OSF repository corresponding to Cruz et al. 2022 at https://osf.io/5pg2a. We have also clarified this in the data availability section of the main text.  

      Minor point:

      In Figure 2, there is a slight bump for the 225-295 distance around 1 nm for Reston. Can the author comment it ? As these results are based on long AS, even if very small, do the authors think this population is significant?

      Comparing the probability distributions obtained from bootstrapping the frames used to calculate the MSM equilibrium probabilities (Revised Fig1), we observe that the bump for the Reston IID distribution is persistent in all bootstraps indicating that it might indeed be significant. This is also consistent with our observation that the cysteine 296 does get fully labeled in our thiol labeling experiments, albeit significantly slowly compared to the other homologs.  

      Reviewer #2 (Recommendations for the authors):

      I recommend that the authors implement moderate revisions prior to the publication of this research article, addressing the identified weaknesses (see below).

      The authors should provide a rationale for their selection of the amber03 force field (Duan et al., JCTC 24, 1999-2012, 2003) for molecular dynamics simulations, particularly given the availability of more recent and optimized versions of the AMBER force fields. These newer force fields may offer improved parameterization for biomolecular systems, potentially enhancing the accuracy and reliability of the simulation results.

      We chose the Amber03 force field because it has performed well in much of our past work, including the original prediction of the cryptic pocket that we study in this manuscript. The results presented in this manuscript also demonstrate the predictive power of Amber03.

      Additionally, while the authors utilized solvent-accessible surface area (SASA) for cryptic pocket analysis, volume-based approaches may be more suitable for this purpose. Several studies (e.g., Sztain et al. J. Chem. Inf. Model. 2021, 61, 7, 3495-3501) have demonstrated the utility of volume analysis in identifying and characterizing cryptic pockets. The authors could consider incorporating such methodologies to provide a more comprehensive assessment of pocket dynamics.

      The authors propose that the cryptic pocket is not merely a random structural feature but functions as an allosteric regulator of dsRNA binding. To further substantiate this claim, an in-depth analysis of this allosteric effect using for instance network analysis could significantly enhance the study. Such an approach could identify key residues and interaction networks within the protein that mediate the allosteric regulation. This type of mechanistic insight would not only provide a stronger theoretical framework but also offer valuable information for the rational design of therapeutic interventions targeting the cryptic pocket.  

      We thank the reviewer for pointing out the need for clarification on the molecular mechanism linking the opening of the cryptic pocket to RNA binding. We have performed the CARDS analysis on Reston and Marburg IID simulations as was done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section. Briefly, we do find a community (blue) comprising the pocket residues in Reston and Marburg IIDs as we did in Zaire. Similarly, we find that many of the RNA binding residues fall into the orange and green communities as in Zaire. However, there are differences in exactly which residues are clustered into which of these two communities. There are also differences in how strongly connected these communities are in the three homologs. Therefore, while we can conclude that pocket residues likely have varying influence on the RNA binding residues in the homologs, it is hard to say exactly what that variation is from this analysis alone.  

      Reviewer #3 (Recommendations for the authors):

      - MD simulations: All simulations were initialised from the 3 crystal structures, is it correct? In all cases, RNA ds was not included in simulations, right? Were crystallographic MG ions in the vicinity of the binding site included? these are known to influence structural dynamics to a large extent.

      All simulations were indeed initialized using only protein atoms from the crystal structures 3FKE, 4GHL, and 3L2A. Therefore, crystallographic Mg ions were not included in the simulations. However, we do agree with the reviewer and think that the effect of parameters such as salt concentration, specifically Mg ions which are known to be important for the stability of dsRNA, on the pocket opening equilibrium merits detailed study in future work.

      - Figure 2: Would it be possible to perform e.g. a block error analysis and show the statistical errors of the distributions?

      We agree that showing the statistical variation in the MSM equilibrium probabilities is important for comparing the different distributions. Therefore, we have updated Figs 2 and 5 to show the distributions obtained from MSMs constructed using 100 and 10 random samples of the data respectively to indicate the extent of the statistical variability in the MSM construction.  

      - More detailed structural biology experiments (such as NMR or HDX-MS) could potentially shed more light on the differential behaviour of the three different homologs, providing more evidence for the presence of the cryptic pocket.

      We agree that NMR and HDX-MS are powerful means to study dynamics and are actively exploring these approaches for our future work.

    1. The language was important invention

      Believes in the same type of religions and custums so that people that we ahvnet met are still recognizable wen we run into them it too thousands of years with the invention of art and language before you were talking you were making arting

      hi my name is Alan Kay and I like to apologize for having a bit of laryngitis just on the day of this shoot and I've been asked to talk about inventing the future and of course we mostly think of inventing in the realm of technology but I think most people watching this will have been struck by the fact that living in the 21st century in the United States is a vastly different experience than living a hundred thousand years ago anywhere in the world and as far as we know the brains that we have are roughly the same as those brains that belong to the very same species we are mostly lived in small groups of people hunting and gathering and falling in love and telling stories to each other and fighting other people taking revenge caring for the young and gradually building up a culture that they taught to the next generation in their tribe and the first great invention of human beings or of evolution was this idea of culture and it came from a slightly earlier invention of evolution which was language and there's a language that just as a few important ways more different than our primate ancestors and that was enough to be able to deal with sequences of things and portrayals of things and being able to make up things which we no other animals can do but be able to tell our made-up things to other people and get them to believe in it that started to allow us to aggregate together in larger than about a hundred people which is what we can deal with face to face so this notion of culture beliefs in the same kinds of religions beliefs in the same kinds of customs is something that can spread so that people we've never met are still recognizable when we finally do run into them so we can think about that as the first great inventing the future for Humanity and it took many tens of thousands of years with the invention art which may have always been with us in between before we started taking stock of the world around us and starting

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported. However, several aspects of the study require clarification and discussion:

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of this interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation. As we do not have yet experimental data to support this model, we have not included this model in the manuscript. Yet, we will update the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. The group of Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require us utilize some of the mice described in above reference, which is beyond the scope of this manuscript. We, however, will revise the discussion to elaborate on this potential mechanism.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, like the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed until we sacrificed the mice at P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing use to the reference showing the heterozygous mice show glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that the better understanding the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are unfortunately beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. Most of the criticisms raised by the reviewer will be easily addressed in the revised version of the manuscript. Yet, none of the critiques raised by the reviewer seems to directly impact the overall interpretation of the data.

      Reviewer #3 (Public Review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected before the two patients with the BICC1 p.Ser240Pro mutation passed away. To address this missing link, we have – as a first pass - generated HEK293T cells carrying the BICC1 p.Ser240Pro variant. While these admittingly are not kidney epithelial cells, they indeed show a reduced level of PC2 expression. These data are shown in the manuscript. We have not yet addressed how this relates to its crosstalk with miR-17.

      Conclusion:

      The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      (2) Role of BiCC1 in mir/PKD1/2 complex should be the next step in the quest for a modifiable therapeutic target.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript de las Mercedes Carro et al investigated the role of Ago proteins during spermatogenesis by producing a triple knockout of Ago 1, 3 and 4. They first describe the pattern of expression of each protein and of Ago2 during the differentiation of male germ cells, then they describe the spermatogenesis phenotype of triple knockout males, study gene deregulation by scRNA seq and identify novel interacting proteins by co-IP mass spectrometry, in particular BRG1/SMARCA4, a chromatin remodeling factor and ATF2 a transcription factor. The main message is that Ago3 and 4 are involved in the regulation of XY gene silencing during meiosis, and also in the control of autosomal gene expression during meiosis. Overall the manuscript is well written, the topic, very interesting and the experiments, well-executed. However, there are some parts of the methodology and data interpretation that are unclear (see below).

      Major comments

      1= Please clarify how the triple KO was obtained, and if it is constitutive or specific to the male germline. In the result section a Cre (which cre?) is mentioned but it is not mentioned in the M&M. On Figure S1, a MICER VECTOR is shown instead of a deletion, but nothing is explained in the text nor legend. Could the authors provide more details in the results section as well as in the M&M ? This is essential to fully interpret the results obtained for this KO line, and to compare its phenotype to other lines (such as lines 184-9 Comparison of triple KO phenotype with that of Ago4 KO). Also, if it is a constitutive KO, the authors should mention if they observed other phenotypes in triple KO mice since AGO proteins are not only expressed in the male germline.

      Response: We apologize for omitting this vital information. We have now incorporated a more detailed description of how the Ago413 mutant was created in the results and M&M sections (line 120 and 686 respectively).

      As mentioned in the manuscript, Ago4, Ago1 and Ago3 are widely expressed in mammalian somatic tissues. Mutations or deletions of these genes does not disrupt development; however, there is limited research on the impact of these mutations in mammalian models in vivo. In humans, mutations in Ago1 and Ago3 genes are associated with neurological disorders, autism and intellectual disability (Tokita, M.J.,et al. 2015- doi: 10.1038/ejhg.2014.202., Sakaguchi et al. 2019- doi: 10.1016/j.ejmg.2018.09.004, Schalk et al 2021- doi: 10.1136/jmedgenet-2021-107751). In mouse, global deletion of Ago1 and Ago3 simultaneously was shown to increase mice susceptibility to influenza virus through impaired inflammation responses (Van Stry et al 2012- doi.org/10.1128/jvi.05303-11). Studies performed in female Ago413 mutants (the same mutant line used herein) have shown that knockout mice present postnatal growth retardation with elevated circulating leukocytes (Guidi et al 2023- doi: 10.1016/j.celrep.2023.113515). Other studies of double conditional knockout of Ago1 and Ago3 in the skin associated the loss of these Argonautes with decreased weight of the offspring and severe skin morphogenesis defects (Wang et al 2012- doi: 10.1101/gad.182758.111). In our study, we did not observe major somatic or overt behavioral phenotypes, and we did not observe statistical differences in body weights of null males compared to WT as shown in figure below.

      2= The paragraph corresponding to G2/M analysis is unclear to me. Why was this analysis performed? What does the heatmap show in Figure S4? What is G2/M score? (Fig 2D). Lines 219-220, do the authors mean that Pachytene cells are in a cell phase equivalent to G2/M? All this paragraph and associated figures require more explanation to clarify the method and interpretation.

      __Response: __We have modified the methods to include more information about how the cell cycle scoring used in Figures 2D and S4 were calculated and will add more information regarding the interpretation of these figures.

      3= I have concerns regarding Fig2G: to be convincing the analysis needs to be performed on several replicates, and, it is essential to compare tubules of the same stage - which does not seem to be the case. This does not appear to be the case. Besides, co (immunofluorescent) staining with markers of different cell types should be shown to demonstrate the earlier expression of some markers and their colocalization with markers of the earlier stages.

      __Response: __We agree with the Reviewer. New images with staged tubules will be added to the analysis of Figure 2G.

      4= one important question that I think the authors should discuss regarding their scRNAseq: clusters are defined using well characterized markers. But Ago triple KO appears to alter the timing of expression of genes... could this deregulation affects the interperetation of scRNAseq clusters and results?

      __Response: __We thank the reviewer for this suggestion and agree that including this information is important. We expect that, at most, this dysregulation impacts the edges of these clusters slightly. Given that marker genes that have been used to define cell types in these data are consistently expressed between the knockout and wildtype mice (see Figure S4A), we do not think that the cells in these clusters have different identities, just dysregulated expression programs. We have added the relevant sentence to the discussion, and will include additional supplemental figure panels to document this point more comprehensively.

      5= XY gene deregulation is mentioned throughout the result section but only X chromosome genes seem to have been investigated.... Even the gene content of the Y is highly repetitive, it would be very interesting to show the level of expression of Y single copy and Y multicopy genes in a figure 3 panel.

      __Response: __We agree with the reviewer that including analysis of Y-linked genes is important. We will add a supplemental figure which includes the Y:Autosome ratio and differential expression analysis.

      6= Can the authors elaborate on the observation that X gene upregulation is visible in the KO before MSCI; that is in lept/zygotene clusters (and in spermatogonia, if the difference visible in 3A is significant?)

      Response: We do see that X gene expression is upregulated before pachynema. Previous scRNA-seq studies that have looked at MCSI have seen that silencing of genes on the X and Y chromosomes starts before the cell clusters that are defined as pachynema, though silencing is not fully completed until pachynema. We have clarified this point in the manuscript.

      7 = miRNA analysis: could the authors indicate if X encoded miRNA were identified and found deregulated? Because Ago4 has been shown to lead to a downregulation of miRNA, among which many X encoded. It is therefore puzzling to see that the triple KO does not recapitulate this observation. Were the analyses performed differently in the present study and in Ago4 KO study?

      __Response: __The analysis identifying downregulation of miRNA in the original Ago4 mutant analysis was conducted relative to total small RNA expression. Amongst those altered miRNA families in the Ago4 mutants, we demonstrated both upregulation and downregulation of miRNA. We agree that confirming a similar global downregulation of miRNA counts compared to other small RNAs is important. Therefore, in a revised manuscript, we will add this information to the miRNA analysis section, especially highlighting the X chromosome-associated miRNAs, as well as whether the ratios between other small RNA classes change.

      8 = The last results paragraph would also benefit from some additional information. It is not clear why the authors focused on enhancers and did not investigate promoters (or maybe they were but it's unclear). Which regions (size and location from TSS) were investigated for motif enrichment analyses? To what correspond the "transcriptional regulatory regions previously identified using dREG" mentioned in the M&M? I understand it's based on a previous article, but more info in the present manuscript would be useful.

      Response: We thank the reviewer for this suggestion. The regions that were used for motif enrichment will be included as a supplementary information in the fully revised manuscript. We have also clarified in the methods that these transcriptional regulatory regions were downloaded from GEO and obtained from previous ChRO-seq data (from GEO) analysis. These data are run through the dREG pipeline that identifies regions predicted to contain transcription start sites, which include promoters and enhancers.

      Minor comments

      1) In the introduction: The sentence "Ago1 is not expressed in the germline from the spermatogonia stage onwards allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis." is misleading because Ago1 is expressed at least in spermatogonia; It would be more precise to write "after spermatogonia stage" and rephrase the sentence. Otherwise it is surprising to see AGO1 protein in testis lysate and it is not in line with the scRNA seq shown in figure 2.

      __Response: __We agree with the Reviewers suggestion and have edited the sentence on line 100. This sentence now reads "Ago1 is not expressed in the germline after the spermatogonia stage allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis".

      2) Could the authors precise if AGO proteins are expressed in other tissues? In somatic testicular cells?

      __Response: __Expression patterns of mammalian AGOs have been described in somatic and testicular tissues for the mouse by Gonzales-Gonzales et al (2008) by qPCR. They found that Ago2 is expressed in all the somatic tissues analyzed (brain, spleen, heart, muscle and lung) as well as the testis, with the highest expression in brain and lowest in heart. Ago1 is highly expressed in spleen compared to all the tissues analyzed, while Ago3 and Ago4 showed highest expression in testis and brain. Within somatic tissues of the testis, the four argonautes are expressed in Sertoli cells, however, Ago1,3 and 4 expression is very low compared to Ago2, with the latter showing a 10-fold higher transcript level. We have included a sentence with this information in the introduction in line 89.

      3) Pattern of expression: How do the authors explain that AGO3 disappears at the diplotene stage and reappears in spermatids?

      __Response: __ Single cell RNAseq data in the germline shows reduced transcript for Ago3 from the Pachytene stage onwards, suggesting minimal if any new transcription in round spermatids. We hypothesize that the AGO3 protein present in the round spermatid stage is cytoplasmic, presumably coming from the pool of AGO3 in the chromatoid body, a cytoplasmic structure with functional association with the nucleus in round spermatids (Kotaja et al, 2003 doi: 10.1073/pnas.05093331).

      4) It would be useful to show the timing of expression of AGO 1 to 4 throughout spermatogenesis in the first paragraph of the article. Maybe the authors could present data from fig2B earlier?

      Response: We understand the Reviewers concern, however, given that Ago expression throughout spermatogenesis was obtained from scRNA seq, we consider that this data should be presented after introducing the Ago413 knockout and the scRNA seq experiment. As Ago1-4 expression was also described in an earlier manuscript by Gonzales-Gonzales et al in the mouse male germline, and our data aligns with this report, we included a sentence about these previous findings in the earlier results section.

      5) Line 190: please modify the sentence "reveal no differences in cellular architecture of the seminiferous tubules when compared to wild-type males" to " reveal no gross differences..." since even without quantification of the different cell types it is visible that KO seminiferous tubules are different from WT tubules.

      __Response: __We agree with the reviewer, and we modified line 190 (now 173) as suggested. Grossly, seminiferous tubules from Ago413 null males contain the same cell types as in wild type tubules, including spermatozoa. However, our studies show that the number and quality of germ cells is compromised in knockouts, as shown by sperm counts and TUNEL staining.

      6) TUNEL analysis: please stage the tubules to determine the stage(s) at which apoptosis is the most predominant.

      __Response: __We have complied with the reviewer suggestion. Figure 1G now shows staged seminiferous tubules, and we have replaced the wild type image for one where the staged tubules match the knockout image.

      7) Figure S4B does not show an increase of cells at Pachytene stage but at Lepto/zygotene stage (as well as an increase of spermatogonia). Please comment this discrepancy with results shown in Fig2.

      __Response: __Figures 2 and S4 show distribution of cells in different substages of spermatogenesis and prophase I measured with very different methods: a cytological approach using chromosome spreads cells vs a transcriptomic approach that involves clustering of cells. We attribute the differences in cell type distribution to differences in the sensitivity of the methods to identify each cell type and therefore identify differences between the number of cells for each group. Moreover, our scRNA-seq data groups the leptotene and zygotene stages together, while the cytological approach allows for separation of these two sub-stages. Importantly, both results show that Ago413 spermatocytes are progressing slower from pachynema into diplonema and/or are dying after pachynema, as stated in line 194 in our manuscript.

      8) Fig5H and 5I are not mentioned in the result section. Also, it would be useful to label them with "all chromosomes" and "XY" to differentiate them easily

      __Response: __We apologize for the omission and have now cited Figures 5H and 5I in the manuscript (line 453). We have added the suggested labels.

      9) Line 530 "data provide further evidence for a functional association between AGO-dependent small RNAs and heterochromatin formation, maintenance and/or silencing." Please rephrase, the present article does not really show that AGO nuclear role depends on small RNAs.

      __Response____: __We agree with the reviewer that these data do not directly show a dependence on small RNAs. As our identified localization of AGO proteins to the pericentric heterochromatin coincides with localization of DICER shown previously by Yadav and collaborators (2020, doi: 10.1093/nar/gkaa460), we do believe that our data further implicates small RNAs in the silencing of heterochromatin. Yadav et al shows that DICER localizes to pericentromeric heterochromatin and processes major satellite transcripts into small RNAs in mouse spermatocytes, and cKO germ cells have reduced localization of SUV39H2 and H3K9me3 to the pericentromeric heterochromatin. Given the colocalization of both small RNA producing machinery and AGOs at pericentromeric heterochromatin, the AGOs may bind these small RNAs, and the statement in line 530 refers to how our results provide evidence for the involvement of other RNAi machinery in the silencing of pericentromeric heterochromatin investigated by Yadav et al which likely includes small RNAs.

      To clarify this point, we have modified the text accordingly.

      10) Line 1256: replace "cite here " by appropriate reference

      __Response: __The reference was added to line 1256.

      11) Please use SMARCA4 instead of BRG1 name as it is its official name.

      __Response: __We have replaced BRG1 with SMARCA4 in the text and figures.

      Figures:

      Figure 1: Are the pictures shown for Ago3-tagged and floxed from the same stages ? The leptotene stage in 1A looks like a zygotene, while some pachytene/diplotene stage pictures do not look alike.

      __Response: __New representative images have been added to figure 1 to match the same substages across the figure.

      Figure 1D, please label the Y scale properly (testis weight related to body weight)

      __Response: __We have fixed this.

      FigS1: Please comment the presence of non-specific bands in the figure legend

      __Response: __We have added a sentence in Figure S1 Legend.

      Fig 2E and F, please indicate on the figure (in addition to its legend), what are the X and Y axes respectively to facilitate its reading.

      __Response: __X and Y axes are now labelled in Figure 2E and F.

      2F: please use an easier abbreviation for Spermatocyte than Sp (which could spermatogonia, sperm etc..) such as Scyte I ? (same comment for Fig 3C)

      Response: The abbreviation for spermatocyte was changed from Sp to Scyte I in Figures 2 and 3.

      Overall, for all figures showing GSEA analyses, could the authors explain what a High positive NES and a High negative NES mean in the results section?

      Response: Thank you for this suggestion. We have added this information where the GSEA score of the cell markers is initially introduced.

      Significance

      Ago proteins are known for their roles in post transcriptional gene regulation via small RNA mediated cleavage of mRNA, which takes places in the cytoplasm. Some Ago proteins have been shown to be also located in the nucleus suggesting other non-canonical roles. It is the case of Ago4 which has been shown to localize to the transcriptionally silenced sex chromosomes (called sex body) of the spermatocyte nucleus, where it contributes to regulate their silencing (Modzelewski et al 2012). Interestingly, Ago4 knockout leads to Ago3 upregulation, including on the sex body indicating that Ago3 and Ago4 are involved in the same nuclear process. In their manuscript, de las Mercedes Carro et al., investigate the consequences of loss of both Ago3 and Ago4 in the male germline by the production of a triple knockout of Ago1, 3 and 4 in the mouse. With this model, the authors describe the role of Ago3 and Ago4 during spermatogenesis and show that they are involved in sex chromosome gene repression in spermatocytes and in round spermatids, as well as in the control of autosomal meiotic gene expression. Triple KO males have impaired meiosis and spermiogenesis, with fewer and abnormal spermatozoa resulting in reduced fertility. Since Ago1 male germline expression is restricted to pre-meiotic germ cells, it is not expected to contribute to the meiotic and postmeiotic phenotypes observed in the triple KO. The strengths of the study are i) the thorough analyses of mRNA expression at the single cell level, and in purified spermatocytes and spermatids (bulk RNAseq), ii) the identification of novel nuclear partners of AGO3/4 relevant for their described nuclear role: ATF2, which they show to also co-localize with the sex body, and BRG1/SMARCA4, a SWI/SNF chromatin remodeler. The main limitation of the study is the lack of information in the method regarding the production of the triple KO, as well as some aspects of the transcriptome and motif analyses. It is also surprising to see that the triple KO does not recapitulate the miRNA deregulation observed in Ago4 KO. The characterization of a non-canonical role of AGO3/4 in male germ cells will certainly influence researchers of the field, and also interest a broader audience studying Argonaute proteins and gene regulation at transcriptional and posttranscriptional levels.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript titled "Argonaute proteins regulate the timing of the spermatogenic transcriptional program" by Carro et al., the authors present their findings on how Argonaute proteins regulate spermatogenic development. They utilize a mouse model featuring a deletion of the gene cluster on chromosome 4 that contains Ago1, Ago3, and Ago4 to investigate the cumulative roles of AGO3 and AGO4 in spermatogenic cells. The authors characterize the distribution of AGO proteins and their effects on key meiotic milestones such as synapsis, recombination, meiotic transcriptional regulation, and meiotic sex chromosome inactivation (MSCI). They analyze stage-specific transcriptomes in spermatogenic cells using single-cell and bulk RNA sequencing and determine the interactome of AGO3 and AGO4 through mass spectrometry to examine how AGO proteins may regulate gene expression in these cells during meiotic and post-meiotic development. The authors conclude that both AGO3 and AGO4 are essential for regulating the overall gene expression program in spermatogenic cells and specifically modulate MSCI to repress sex-linked genes in pachytene spermatocytes, which may be partially mediated by the proper distribution of DNA damage repair factors. Additionally, AGO3 is suggested to interact with the chromatin remodeler SWI/SNF factor BRG1, facilitating its removal from the sex-chromatin to enable the repression of sex-linked genes during MSCI.

      Major Comments: 1. The study utilized a triple knockout mouse model to determine the effect of AGO3 on spermatogenesis, following up on their previous report about the role of AGO4 in spermatogenesis, which resulted from an upregulation of AGO3 in Ago4-/- spermatocytes. However, the results are more difficult to interpret and ascertain the role of AGO3 in these cells, given the absence of any observable phenotype from Ago3 interruption. AGO4 regulates sex body formation, meiotic sex chromosome inactivation (MSCI), and miRNA production in spermatocytes, all of which were noted in the absence of both AGO3 and AGO4, with only an increased incidence of cells containing abnormal RNAPII at the sex chromosomes. It will be necessary to characterize how AGO3 regulates spermatogenic development, including meiotic progression and the regulation of the meiotic transcriptome, and compare these findings with the current observations to determine if the proposed mechanism involving AGO3, BRG1, and possibly AP2 is relevant in this context.

      __Response: __While we agree with Reviewer that a single Ago3 knockout will help understand distinct roles of AGO3 and AGO4 in spermatogenesis, the time and resources required to generate a new mouse model are substantial. The analysis included in this current manuscript has already taken over seven years, and with the lengthy production of a new single mutant mouse, validation of the new mouse, and then final analysis, we would be looking at another 3-5 years of analysis. In the current funding climate, and with strong concerns over ensuring reduction in utilization of laboratory mice, we consider this request to be far in excess of what is required to move this important story forward.

      The Ago413-/- mouse model has allowed us to associate a nuclear role of Argonaute proteins with a strong reproductive phenotype in the mouse germline. Given the redundancy between Ago3 and Ago4, it is likely that a single Ago3 knockout would have a mild phenotype just like the Ago4 KO. All this said, we agree with the reviewer that analysis of an Ago3 knockout mouse is a valuable next step, just not within this chapter of the story.

      1. Does Ago413-/- mice recapitulate the early meiotic entry phenotype observed in Ago4-/- mice? If not, could it be possible that AGO3 promotes meiotic entry, given its strong mRNA expression in spermatogonia according to the scRNAseq data (Fig. 2B)

      Response: Our scRNA-seq data shows strong expression of Ago3 in spermatogonia, as mentioned by the Reviewer. Analysis of cell cycle marker expression also shows that the transcriptomic profile of spermatogonia is altered, with higher levels of transcripts corresponding to the later G2/M stages (Figure 2D). Moreover, Ago413 knockouts present an increase in the number of spermatogonial stem cells (Supplementary Figure S4B). However, this cluster represents a pool of quiescent and mitotically active cells entering meiosis, therefore interpretation of these data might be challenging. While specific experiments could be conducted to answer this question, this is outside of the scope of our manuscript. The manuscript as it stands is already rather large, and a full analysis of meiotic entry dynamics would dilute the core message relating to chromatin regulation in the sex body.

      1. The authors suggested that the removal of BRG1 by AGO3 is necessary during sex body formation and the eventual establishment of MSCI. However, the BAF complex subunit ARID1A has been shown to facilitate MSCI by regulating promoter accessibility. It will be interesting to determine how BRG1 distribution changes across the genome in the absence of AGO proteins and how that correlates with alterations in sex-linked gene expression.

      __Response: __We agree that changes in BRG1 distribution across the genome would be very interesting to identify. However, in this work we show that BRG1/SMARCA4 protein changes its localization in the sex body very rapidly between early to late pachynema. These two substages are only discernable by immunofluorescence using synaptonemal complex markers, as there are currently no available techniques to enrich for these subfractions. Therefore, study of genome occupancy of BRG1 in these specific substages by techniques such as CUT&Tag are not currently possible. However, we are currently working on new methods to distinguish these cell populations and hope eventually to use these purification strategies to perform the studies suggested by this reviewer. Alternatively, the hope is that single cell CUT&Tag methods will become more reliable, and will enable us to address these questions. Both of these options are not currently available to us. The studies by Menon et al (2024-doi:10.7554/eLife.88024.5) provide strong evidence to support that ARID1A is needed to reduce promoter accessibility of XY silenced genes in prophase I through modulation of H3.3 distribution. However, this mechanism and our identification of the removal of BRG1 between early and late pachytema are not inconsistent with one another, as either SMARCA4 or SMARCA2 can associate with ARID1A as part of the cBAF complex, and ARID1A is also not in all forms of the BAF complex which BRG1 are in. The difference between our results and those seen in Menon et al likely indicate that there are multiple forms of the BAF complex which are differentially regulated during MSCI and play different roles in silencing transcription. Further studies of specific BAF subunits are needed to elucidate how different flavors of the BAF complex act at specific genomic locations and meiotic time points.

      1. The observations presented in this manuscript (Fig. 1D, 2C, 3D, and 4) suggest a haploinsufficiency of the deleted locus in spermatogenic development. How does this compare with the ablation of either Ago3 or Ago4? Please explain.

      Response: Our previous studies in single Ago4 knockouts did not present a heterozygous phenotype (Modzelewski et al 2012, doi: 10.1016/j.devcel.2012.07.003, data not shown). Triple Ago413 knockouts show a much stronger fertility phenotype than single Ago4 knockout. Testis weight of Ago413 homozygous null present a 30% reduction while heterozygous mice show a 15% reduction (Figure 1D), comparable to the 13% reduction previously observed in Ago4-/- males. Sperm counts of Ago413 null and heterozygous males are reduced by 60% and 39% compared to wild type (Figure 1E), respectively, whereas Ago4 null mice have a milder phenotype, with only a 22% reduction in sperm counts. At the MSCI level, both homozygous and heterozygous Ago413 mutant spermatocytes show a similar increase in pachytene spermatocytes with increased RNA pol II ingression into the sex body with respect to wild-type of 35% and 30%, respectively. Ago4 single knockouts show an almost 18% increase in Pol II ingression when compared to wild type. These comparisons are now included in our manuscript in lines 170, 172 and 288. A milder phenotype of the Ago4 knockout and haploinsufficiency in triple Ago413 knockouts but not in Ago4 single knockouts is likely a consequence of the overlapping functions of Ago3 and Ago4 in mammals (and/or overexpression of Ago3 in Ago4 knockouts). In the context of their role in RISC, Wang et al (doi: 10.1101/gad.182758.111) studied the effects of single and double conditional knockouts for Ago1 and Ago2 in miRNA-mediated silencing. They discovered that the interaction between miRNAs and AGOs is highly correlated with the abundance of each AGO protein, and only double knockouts presented an observable phenotype.

      Minor Comments: Based on the interactome analysis, it was argued that AGO3 and AGO4 may function separately. Please discuss how AGO3 might compensate for AGO4 (Line 109).

      Response: We hypothesize that the combined function of AGO3 and AGO4 is needed for proper sex chromosome inactivation during meiosis. We base this hypothesis on the facts that (i) both proteins localize to the sex body in pachytene spermatocytes, (ii) loss of Ago4 leads to upregulation of Ago3, and (iii) the MSCI phenotype of Ago413 knockout mice is much stronger than the single Ago4 knockout (see above). However, AGO3 and AGO4 might not induce silencing through the same mechanism or pathway. In this work, we observed that their temporal expression in prophase I is different; while AGO3 protein seems to disappear by the diplotene stage, AGO4 is present in the sex body of these cells. Moreover, the proteomic analysis revealed a very low number of common interactors, an observation which could support the idea of AGO3 and AGO4 acting by different (albeit perhaps related) mechanisms to achieve MSCI. It is also possible that common interactors were not identified in our proteomic analysis due to the low abundance of AGO3 and AGO4 in the germ cells, limiting the resolution of the proteomics analysis (note that in order to visualize AGO proteins in WB experiments, at least 60 μg of enriched germ cell lysate must be loaded per lane). Moreover, given the difficulty in obtaining enough isolated pachytene and diplotene spermatocytes to perform immunoprecipitation experiments, we performed IP experiments in whole germ cell lysates, which limits the interpretation of our analysis. If AGO3 and AGO4 protein interactors overlap, then AGO3 would directly substitute for AGO4 leading to silencing in single Ago4 knockouts. However, if AGO3 and AGO4 work together through different, complementary mechanisms, then Ago4 mutant mice likely compensates loss of Ago4 by upregulation of Ago3along with specific interactors of the given pathway. We have added a sentence addressing this matter in line 411 of the results section and lines 506 and 513 of the discussion in the revised manuscript.

      In Line 221, it is unclear what is meant by 'cell cycle transcripts'. Does this refer to meiotic transcripts? It is also important to discuss the relevance of the G2/M cell cycle marker genes at later stages of meiotic prophase.

      Response: Thank you for this suggestion. We have changed the relevant text to remove redundancies and include more information. We agree that considering the importance of these genes across meiotic prophase is needed, as cells which are in the dividing stage will already have produced the proteins necessary for division. These cells likely correspond to the diplotene/M cluster cells that have a lower G2/M score, potentially causing the bimodal distribution seen in Figure 2D. We have added a sentence addressing this to the manuscript.

      While identified as a common interactor of both AGO3 and AGO4 in lines 440-445, HNRNPD is not listed among AGO4 interactors in Table S6. Please correct or explain this discrepancy.

      Response: HNRPD was originally identified as an AGO4 interactor using a less strict criteria than the one used in our manuscript: we required consistent enrichment in at least two rounds of IP MS experiments. This reference to HNRNPD was a mistake, given that HNRPD was only enriched in one of our three replicates. Thus, we apologize and have removed the sentence in lines 440-445.

      It is unclear whether wild-type cell lysate or lysate containing FLAG-tagged AGO3 was used for BRG1 immunoprecipitation, and which antibody was used to detect AGO3 in the BRG1 IP sample. A co-IP experiment demonstrating interaction between BRG1 and wild-type AGO3 would be ideal in this context. Furthermore, co-localization by IF would be beneficial to determine the subcellular localization and the cell stages the interaction may be occurring. Additionally, co-IP and Western blot methodologies should be included in the methods section.

      __Response: __MYC-FLAG tagged AGO3 protein lysates were used for BRG1 Co-Immunoprecipitation, along with an anti MYC antibody to detect AGO3. This is now detailed in the Methods section of our revised manuscript (line 1133).

      Regarding BRG1 and AGO3 colocalization by IF, we can confidently show that both AGO3 and BRG1 localize to the sex chromosomes in early pachynema by comparing BRG1/SYCP3 and FLAG-AGO3/SYCP3 stained spreads. We were not able to show colocalization simultaneously on the same cells, given the lack of appropriate antibodies. Our anti FLAG antibody is raised in mouse, while anti BRG1 is raised in rabbit, therefore a non-rabbit, non-mouse anti SYCP3 would be needed to identify prophase I substages, and our lab does not possess such a validated antibody. However, we now have access to a multiplexing kit that allows to use same-species antibodies for immunofluorescence and we can perform these experiments for a revised manuscript.

      __Response: __The methods section now includes description of co-IP methodologies (line 1132). Western Blot methodologies are explained in lane 718, under the "Immunoblotting" title.

      In line 599, it is unclear what is meant by 'persistence of sex chromosome de-repression'. Please correct or clarify this.

      Response: This sentence has been changed and reads: "The persistence of sex chromosome gene expression".

      If possible, please add an illustration to summarize the findings together.

      Response: We thank the reviewer for this suggestion, and have now added this in Figure 6

      Significance

      Overall, this study enhances the understanding of gene expression regulation by AGO proteins during spermatogenesis. Several approaches, including functional, histological, and molecular characterization of the triple knockout phenotype, were instrumental in elucidating the role of AGO proteins in MSCI and meiotic as well as postmeiotic gene regulation. The main limitation of the study is that it is challenging to appreciate the role of AGO3 in addition to the previously published role of AGO4 without the inclusion of necessary control groups. Furthermore, the mechanism of action for AGO proteins in meiotic gene regulation was left relatively unexplored. This study presents new findings that will be significant for the research community interested in gene regulation, chromatin biology, and reproductive biology with the above suggestions considered.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The authors characterize a CRISPR-Cas9 mouse mutant that targets 3 genes that encode AGO family proteins, 2 of which are expressed during spermatogenesis (AGO3 and AGO4) and one that is said is not expressed, AGO1. This mouse mutant showed that AGO3 and AGO4 both contribute to spermatogenesis success as the "Ago413" mutation gave rise to an additive reduction in testis weight, due to spermatocyte apoptosis, and reduction in sperm count. Furthermore, they use insertion mouse mutants for Ago3 and Ago2 that express tagged versions of their corresponding proteins, which they use in combination with pan-AGO antibodies and Ago mutants to show differential expression and localization properties of AGO2, AGO3, and AGO4 (and the absence of AGO1) during spermatogenesis with a particular focus on meiotic prophase. They perform single-cell RNAseq and intricate analyses to demonstrate a change in distribution of meiotic stages in Ago413 mutants, and the overall cell cycle in spermatogonia and spermatocytes is altered. This analysis shows that the mutation leads to an inability to downregulate prior spermatogonia/spermatocyte stage transcripts in a timely manner. On the other hand, later-stage spermatocytes are abnormally expressing spermiogenesis genes. Similar to the Ago4 mutant previously characterized MSCI is disrupted. The authors also show that AGO3 has different interaction partners compared to AGO4 and focus their final assessment on a novel interaction partner of AGO3, BRG1. They show that this factor, which is involved in chromatin remodeling, is aberrantly localized to the sex body during meiotic prophase and diplonema. As BRG1 is involved in open chromatin, it is proposed that AGO3 restricts BRG1 (and related proteins) from the XY chromosome to ensure MSCI. Overall, this paper is very well constructed with mechanistic insights that make this a very impactful contribution to the research community. Major Comments:

      1. The abstract contains "Ago413-/- mouse" without any explanation of what that is. The abstract needs to be a stand-alone document that does not require any referencing for context.

      Response: We have included a sentence describing Ago413 in line 27

      Figure 2C. - The significance bars are confusing as they appear to overlap strangely.

      Response: We have modified this figure and now present the significance bars are on top of the data points.

      On line 235, the authors state that "we first identified the top non-overlapping upregulated genes for Ago413+/+ germ cells in each cluster. Why did the authors not also select down-regulated genes in each cluster to perform a similar analysis?

      __Response: __Thank you for this question. As our goal was to identify genes that are markers of the transcriptional program in each cell type, we used only uniquely upregulated genes for each cluster. Genes that are downregulated for a cluster may be indicative of the transcription in several other cell types, which is not easily interpretable. For a revised manuscript, we will perform this analysis to determine if there is any specific alterations in these downregulated genes.

      Their Ago413 mutant characterization does a good job of assessing meiotic prophase and spermatozoa. However, their assessment of the stages in between these is lacking (meiotic divisions and spermiogenesis).

      Response: We understand the reviewer's concern, however, it is not usual to study stages between the first meiotic division and spermiogenesis because meiosis II is so rapid and thus we lack tools to dissect it. In general, any defect that impacts meiosis I (and particularly prophase I) leads to cell death during prophase I or at metaphase I due to strictly adhered checkpoints that eradicate defective cells. Thus, the increased TUNEL staining in prophase I indicates to us that defective cells are cleared before exit from meiosis I, and those cells progressing to the spermatid stage are "normal" for meiosis II progression. For these cells that did complete meiosis I and progressed normally through meiosis II, we analyzed their spermiogenic outcome extensively (see section entitled "Post-meiotic spermatids from Ago413-/- males exhibit defective spermiogenesis and poor spermatozoa function"). This section included extensive sperm morphology, sperm motility and sperm fertility through in vitro fertilization assays. That said, we have added a sentence on line 268 to explain the transit through meiosis II.

      The discovery of the interaction between BRG1 and AGO3 is exciting. They should assess BRG1 localization in later sub-stages, including late diplonema and diakinesis.

      __Response: __BRG1(SMARCA4) was analyzed throughout prophase I, as shown in image 5G, including quantification of fluorescence intensity included the analysis of diplonema (5H-I). However, diakinesis was not included here since there was no observable signal of BRG1 in these cells. We have explained this in lines 459.

      ATF2 should have been assessed in more detail, as was done for BRG1 in Figure 5.

      __Response: __We agree with the Reviewer, however, staining of chromosome spreads with the anti ATF2 antibody was not possible in our hands after several attempts and changes in staining conditions. However, as staining of sections was successful, we showed localization of ATF2 on spermatocytes by co staining sections with SYCP3 and ATF2.

      Reviewer #3 (Significance (Required)): Overall, this paper is very well constructed with mechanistic insights, as described in my reviewer comments, that make this a very impactful contribution to the research community.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Author response:

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We would like to thank the reviewers for their comments, we see great value in the suggestions they made to strengthen our work. We are glad to see that they are in general positive about the manuscript. In the following, we include a point-by-point response to their comments, which are in general consistent with each other.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Sanchez-Cisneros and colleagues, examine how tracheal cell adhesion to the ECM underneath the epidermis helps shape the tracheal system. They show that if cell-ECM adhesion is perturbed the development of the tracheal system and the epidermis is disrupted. They also detect protrusions extending from the dorsal trunk cells towards the ECM. The work is novel, the figures are clear, and the questions are well addressed. However, I find that some of the claims are not completely supported by the data presented. I have some suggestions that will, I believe, clarify certain points.

      Major comments

      At the beginning of the results section as in the introduction the authors claim that "It is generally assumed that trunk displacement occurs due to tip cells pulling on the trunks so that they follow their path dorsally." This sentence is not referenced, and I do not know where it has been shown or proposed to be like this. In addition, the comparison with the ventral branches is also not referenced and the movie does not really show this. Forces generated by tracheal branch migration have been shown to drive intercalation (Caussinus E, Colombelli J, Affolter M. Tip-cell migration controls stalk-cell intercalation during Drosophila tracheal tube elongation. Curr Biol. 2008;18(22):1727-1734. doi:10.1016/j.cub.2008.10.062), but not dorsal trunk (DT) displacement.

      • *

      We agree that dorsal trunk displacement has not been discussed in previous works, just the fact that tip-cell migration influences stalk cell intercalation. We will rephrase this sentence, stating that dorsal trunk displacement has not been studied.

      However, to rule out the possibility that DT displacement and the phenotype observed in XXX is due to dorsal branch pulling forces, the authors should analyze what happens in the absence of dorsal branches (in condition of Dpp signalling inhibition as in punt mutants or Dad overexpression conditions).

      This is a great idea, and we thank the reviewer for suggesting this. We tried to achieve a similar goal by expressing a Dominant Negative FGFR (Breathless-DN) in the tracheal system, since its expression under btl-gal4 affects tip cell migration. However, the phenotype arises too late to have an effect in dorsal branch migration during the stages we were interested in analyzing. The alternative proposed by the reviewer should be more efficient, as blocking Dpp signalling prevents the formation of dorsal branches completely. We have just received flies carrying the UAS-Dad construct. We will express Dad under btl-gal4 and see how this affects dorsal trunk displacement.

      I am concerned about the TEM observations. The authors claim they can identify tracheal cells by their lumen (Fig. 2 C'). However, at stage 15, the tracheal lumen should be clearly identifiable, and the interluminal DT space should be wider relative to the size of the cells. In this case, there is nothing telling us that we are not looking at a dorsal branch or lateral trunk cell. Furthermore, at embryonic stage 15, the tracheal lumen is filled with a chitin filament, which is not visible in these micrographs. Also, there is quite a lot of tissue detachment and empty spaces between cells, which might be a sign of problems in sample fixing. Better images and more accurate identification of dorsal trunk cells is necessary to support the claim that "These experiments revealed a novel anatomical contact between the epidermis and tracheal trunks".

      The protocol that we use for TEM involves performing 1-μm sections that allow us to stage embryos and to identify the anatomical regions using light microscopy and then switch to ultra-thin sections for electron microscopy once we have found the right position within the sample. This approach also allows us to determine the integrity of the sample. We attach here a micrograph of the last section we analyzed before we decided to do the EM analysis. The asterisk (*) points to a region where the multicellular lumen of the trunk is visible. Due to its proximity to the posterior spiracles, we are confident this is the dorsal trunk and not the lateral trunk. We realize now, after comparing this image with an atlas of development (Campos-Ortega and Hartenstein, 2013), that the stage we chose to illustrate the interaction is a stage 14 embryo instead of the stage 15 we indicated in the manuscript. We will change the stage but given that dorsal closure has already started by stage 14, this does not affect our analysis. Still, we apologize for the mis-staging of the embryo.

      In the light-microscopy image, we have overlaid the EM section to the corresponding region of interest. We agree that the lumen should be thicker compared to the length of the cells, if the section would be cutting the trunk through its largest diameter. However, the protrusions we see do not emerge from the middle part of the trunk where the lumen is found but are seen towards the dorsal side of the trunk, where the lumen will no longer be visible in a longitudinal section as the ones we present. In the embryo shown in Figure 2A-C, our interpretation is that the section was done through a very shallow section of the lumen (represented below). We interpret this from the fact that we see abundant electron-dense areas which we think are adherens junctions from multiple cells. These junctions are visible in Figure 2C but are currently not labelled. We will add arrows to increase their visibility.

      Given that protruding cells lie at the base of dorsal branches, it would be expected that in some sections we would find the protrusions close to the dorsal branches. This is in fact what we show in the micrograph shown in Figure 2D, with a lower magnification overview image shown in Figure S2D. In this case, we see a cell in close proximity to the tendon cells on one side (Figure 2D), which is connected to a dorsal branch on the opposite side (shown in Figure S2D). This dorsal branch is clearly autocellular and chitin deposition is visible as expected for the developmental stage. Again, in Figure S2E we see an electron-dense patch near the lumen that corresponds to the adherens junctions that seal the lumen. We see that all this needs to be better explained in the manuscript, so we will elaborate on the descriptions, and incorporate the light microscopy micrograph to the supplemental figures. This should also aid with the anatomical descriptions requested by Reviewer #3. Nevertheless, we think these observations confirm that what we are describing are the contact points between the dorsal trunk and tendon cells.

      Timelapse imaging of the protrusions in DT cells is done with frames every 4 minutes (Video S3). This is not enough to properly show cellular protrusions and the images do not really show interaction with the epidermis. Video S4 has a better time resolution but it is very short and only shows the cut moment. Video S4, shows the cut, but the reported (and quantified recoil) is not clear. Nevertheless, the results are noteworthy and should be further analysed.

      We will acquire high temporal resolution time-lapse images using E-Cadherin::GFP and btl-gal4, UAS-PH::mCherry to show the behaviour of the protrusions on a short time scale.

      • *

      Provided these embryos survive, would it be possible to check if embryos after laser cutting will develop wavy DTs?

      We think it would be interesting to carry out this experiment, but the laser cut experiments were done under a collaborative visit and we would not be able to repeat it in a short-term period.

      What happens to the larvae under the genetic conditions presented in Fig.S3? Do they reach pupal stages? Do these animals reach adult stages?

      We have seen escapers out of these crosses, but we have not quantified the lethality of the experiment. We will analyse this and include it in the manuscript.

      The kayak phenotypes are very interesting and perhaps the authors could explore them more. As in inhibition of adhesion to the ECM, kay mutants display wavy dorsal trunks. Do they have defective adhesion? Fos being a transcription factor, this is a possibility. The authors should at least discuss the kay phenotypes more extensively and present a suitable hypothesis for the phenotype.

      We agree that the kayak experiments might bring more consequences than just preventing dorsal closure. We will complement this approach by blocking dorsal closure by other independent means. We will use pannier-gal4 (a lateral epidermis driver), engrailed-gal4 (a driver for epidermal posterior compartment), and 332-gal4 (an amnioserosa driver) to express dominant-negative Moesin. In our experience, this also delays dorsal closure and it should result in a similar tracheal phenotype as the one we see in kayak embryos.

      Minor comments

      Page 2 Line 9/10 The sentence "tracheal tubes branch and migrate over neighbouring tissues of different biochemical and mechanical properties to ventilate them." should be rewritten. Tracheal cells do not migrate over other tissues to ventilate them.

      We meant to say that tracheal cells migrate over other tissues at the same time as they branch and interconnect to allow gas exchange in their surroundings after tracheal morphogenesis is completed. Ventilation is used here as a synonym for gas exchange or breathing. We will rephrase this if the reviewer considers it confusing.

      Page 2 Line 24/25 The sentence "It has been generally assumed that trunks reach the dorsal side of the embryo because of the pulling forces of dorsal branch migration." needs to be backed up by a reference.

      As explained above, we will rephrase this sentence.

      Page 7 Line 32/23 In this sentence, the references are not related to dorsal closure "Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development, or vice versa (Letizia et al., 2023; Reichman-Fried et al., 1994)."

      Our goal in this sentence was to explain that while JNK is required for proper epidermal dorsal closure, loss of JNK signaling in the trachea does not affect tracheal development (as shown by Letizia et al., 2023). At the same time, Reichman-Fried et al., 1994 described the phenotypes of loss of breathless (btl). We will remove this last reference as the work does not study the epidermis. We will rephrase the sentence as: “Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development; namely, JNK signaling (Letizia et al., 2023).”

      Page 12 Line 1 "Muscles attach to epidermal tendon cells through a dense meshwork of ECM" this sentence must be referenced.

      We will add the corresponding references for this statement: (Fogerty et al., 1994; Prokop et al., 1998; Urbano et al., 2009). We will change “dense” for “specialized”.

      Fig. S1- Single channel images (A'-C' and A'-C') should be presented in grayscale.

      Fig. S4- Single channel images (A'-D' and A'-D') should be presented in grayscale.

      We will add the grayscale, single-channel images for these figures.

      Reviewer #1 (Significance (Required)):

      The findings shown in this manuscript shed light on the interactions and cooperation between two organs, the tracheal system and the epidermis. These interactions are mediated by cell-ECM contacts which are important for the correct morphogenesis of both systems. The strengths of the work lie on its novelty and live analysis of these interactions. However, its weaknesses are related to some claims not completely backed by the data, some technical issues regarding imaging and some over-interpreted conclusions.

      This basic research work will be of interest to a broad cell and developmental biology community as they provide a functional advance on the importance of cell-ECM interactions for the morphogenesis of a tubular organ. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, the authors explore the relationships between two Drosophila tissues - the epidermis and tracheal dorsal trunk (DT) - that get dorsally displaced during mid-late embryogenesis. The show a nice temporal correlation between the movements of the epithelia during dorsal closure and DT displacement. They also show a correlation between the movement of an endogenously tagged version of collagen and the DT, suggesting that the ECM may contribute to this coordinated movement. Through high magnification TEM, they show that tracheal cells make direct contact with the subset of epithelial cells, known as tendon cells, that also serve as muscle attachment sites. In between these contact sites, tracheae are separated from the epithelia by the muscles. Furthermore, the TEMs and confocal imaging of tracheal cells expressing a membrane marker at these contact sites show that the tracheal cells are extending filopodia toward the tendon cells. The authors then explore how a variety of perturbations to the ECM produced by the tendon and DT cells affect DT and epithelial movement. They find that expressing membrane-associated matrix metalloproteases (MMP1 or MMP2) in tendon cells as well as perturbations in integrin or integrin signaling components leads to delays in dorsal displacement as well as defective lengthening of the tracheal DT tubes. They find that defects in the association between the tracheal and epidermal ECM attachments affect dorsal displacement of the epidermis, disrupting dorsal closure.

      Major comments: I like the goals of this paper testing the idea that the ECM plays important roles in the coordination of tissue placement, and I think they have good evidence of that from this study. However, I disagree with the conclusions of the authors that disrupting contact between DT and the tendon cells has no effect on DT dorsal displacement. DT tracheal positioning is clearly delayed; the fact that it takes a lot longer indicates that the ECM does affect the process. It's just that there are likely backup systems in place - clearly not as good since the tracheal tubes end up being the wrong length.

      We agree with this view; in our deGradFP experiments we see a delayed DT displacement. We focused our analyses on the coordination with epidermal remodelling, which remained unaltered, but we in fact see a delayed progression in dorsal displacement of both tissues (Figure 5I-J). We will emphasize this in the corresponding section of the Results.

      It also seems important that the parts of the DT where the dorsal branches (DB) emanate are moving dorsally ahead of the intervening portions of the trachea. This suggests to me that the DB normally does contribute to DT dorsal displacement and that this activity may be what helps the DT eventually get into its final position. The authors should test whether the portions of the DT that contact the DB are under tension. If the DB migration is providing some dorsal pulling force on the DT, this may also contribute to the observed increases in DT length observed with the perturbations of the ECM between the tendon cells and the trachea - if tube lengthening is a consequence of the pulling forces that would be created by parts of the trachea moving dorsally ahead of the other parts. Here again, it would be good to test if the DT itself is under additional tension when the ECM is disrupted.

      • *

      We thank the reviewer for the suggested experiments. We agree with the fact that the dorsal branches should pull on the dorsal trunk and that this interaction should generate tension. Unfortunately, we are unable to test this with the experiments proposed by the reviewer, but we propose an alternative strategy to overcome this. We understand that the reviewer suggests we do laser cut experiments in dorsal branches to see if there is a recoil in the opposite direction of dorsal branch migration. We carried out our laser cut experiments using a 2-photon laser through a visit to the EMBL imaging facility, using funds from a collaborative grant. Funding a second visit would require us to apply for extra funding, which would delay the preparation of the experiments. We are aware of UV-laser setups within our university, however, UV-laser cuts would also affect the epidermis above the dorsal branches, which we think might contribute to recoil we would expect to see.

      Instead of doing laser cuts, we have designed an experiment based on the suggestion of reviewer #1 of blocking Dpp signaling (with UAS-Dad), which would prevent the formation of dorsal branches. We expect that in this experimental setup, the trunk will bend ventrally in response to thepulling forces of the ventral branches. We will also co-express UAS-Dad (to prevent dorsal branch formation) and UAS-Mmp2 (to ‘detach’ the dorsal trunk from the epidermis), and we would expect to at least partially rescue the wavy trunk phenotype.

      Minor comments: The authors need to do a much better job in the intro and in the discussion of citing the work of the people who made many of the original findings that are relevant to this study. Many citations are missing (especially in the introduction) or the authors cite their own review (which most people will not have read) for almost everything (especially in the discussion). This fails to give credit to decades of work by many other groups and makes it necessary for someone who would want to see the original work to first consult the review before they can find the appropriate reference. I know it saves space (and effort) but I think citing the original work is important.

      • *

      The reviewer is right; we apologize for falling into this practice. We will reference the original works wherever it is needed.

      Figure 7 is not a model. It is a cartoon depicting what they see with confocal and TEM images.

      We will change the figure; we will include our interpretations of the phenotypes we observed under different experimental manipulations.

      Reviewer #2 (Significance (Required)):

      Overall, this study is one of the first to focus on how the ECM affects coordination of tissue placement. The coordination of tracheal movement with that of the epidermis is very nicely documented here and the observation that the trachea make direct contact with the tendon cells/muscle attachment sites is quite convincing. It is less clear from the data how exactly the cells of the trachea and the ECM are affected by the different perturbations of the ECM. It seems like this could be better done with immunostaining of ECM proteins (collagen-GFP?), cell type markers, and super resolution confocal imaging with combinations of these markers. What happens right at the contact site between the tendon cell and the trachea with the perturbation? I think that at the level of analysis presented here, this study would be most appropriate for a specialized audience working in the ECM or fly embryo development field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary The manuscript by Sanchez-Cisneros et al provides a detailed description of the cellular interactions between cells of the Drosophila embryonic trachea and nearby tendon and epidermal cells. The researchers use a combination of genetic experiments, light sheet style live imaging and transmission electron microscopy. The live imaging is particularly clear and detailed, and reveals protruding cells. The results overall suggest that interactions mediated through the ECM contribute to development of trachea and dorsal closure of epidermis. One new aspect is the existence of dorsal trunk filipodia that are under tension and may impact tracheal morphogenesis through required integrin/ECM interactions.

      Major comments: - Are the key conclusions convincing? Generally, the key conclusions are well supported by the data, and the movies are very impressive. Interactions between the cell types are clearly shown, as is the correlations in their development. However, some of the images are challenging to decipher for a non-expert in Drosophila trachea, especially the EM images, and some of the data is indirect or a bit weak.

      We thank the reviewer for their observations. As mentioned above in response to Reviewer #1, we will add an overview image of the embryo we processed for TEM that is presented in Figure 2.

      The data related to failure of dorsal closure affecting trachea relies on one homozygous allele of one gene (kayak), and so this is somewhat weak evidence. Even though kay is not detected in trachea, there could be secondary effects of the mutation or another lesion on the mutant chromosome. The segments look a bit uneven in the mutant examples.

      • *

      The reviewer is right; as we proposed before, we will complement the kayak experiments with independent approaches that will delay dorsal closure.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Some of the experiments have low n values, especially in imaging experiments, so these may be more preliminary, but they are in concordance with other data.

      The problem we face in our live-imaging experiments is related to the probability of finding the experimental embryos. In most of our experiments we combine double-tissue labelling plus the expression of genetic tools. This generally corresponds to a very small proportion of the progeny. We will aim to have at least 4 embryos per condition.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Higher n-values would substantiate the claims. To strengthen the argument that dorsal closure affects trachea morphogenesis mechanically, the authors might consider using of a combination of kay mutant alleles or other mutant genes in this pathway to provide stronger evidence. Or they could try a rescue experiment in epidermis and trachea separately for the kay mutants.

      We think our experiments delaying dorsal closure using the Gal4/UAS system and a variety of drivers should address the point of the possible indirect effects of kay in tracheal development.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Imaging data can take awhile to obtain, but the genetic experiments could be done in a couple of months, and the authors should be able to obtain any needed lines within a few weeks.

      The reviewer is correct, we will be able to plan our crosses for the proposed experiments within a couple of months.

      • Are the data and the methods presented in such a way that they can be reproduced? Generally, yes. For the deGrad experiments, it is not clear how the fluorescent intensity was normalized - was this against a reference marker?

      Briefly, we used signals from within the embryo as internal controls. In the case of en-gal4, we normalized the signal to the sections of the embryo where en is not expressed and therefore, beta-integrin levels should not be affected. In the case of btl-gal4, we normalized against the signal surrounding the trunks which should also not be affected by the deGradFP system. We will elaborate on these analyses in the methods section.

      Are the experiments adequately replicated and statistical analysis adequate? There are several experiments with low n values, so this could fall below statistical significance. For example, data shown in Fig 1G: n=3; Fig 4D n=4, n=3; Fig 6J n=4

      As mentioned above, we will increase our sample sizes.

      Minor comments: - Specific experimental issues that are easily addressable. To make the TEM images more easily interpreted, it would be helpful to provide a fluorescent image of all the relevant cell types (especially trachea, epidermis, muscle, and tendon cells, plus segmental boundaries) labelled accordingly, so that reader can correlate them more easily with the TEM images. They might also include a schematic of an embryo to show where the TEM field of view is.

      We believe this should be addressed by adding the light microscopy section of the embryo with the TEM image overlaid as illustrated above.

      It is hard to be confident that the EM images reflect the cells they claim and that the filopodia are in fact that, at least for people not used to looking at these types of images.

      As we explained in the response to Reviewer #1, we will elaborate on the descriptions of our TEM data. We think that adding the reference micrograph will aid with the interpretations of the TEM images.

      • Are prior studies referenced appropriately? yes
      • Are the text and figures clear and accurate? yes

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? The writing could be revised to be a bit clearer. Since the results of the experiments do not support the initial hypothesis, I found it a bit confusing as I read along. It may help to introduce an alterative hypothesis earlier to make the paper more logical and easy to follow. To be more specific, On page 3, the authors say they "show that dorsal trunk displacement is mechanically coupled to the remodelling of the epidermis" and also in the results comment that "With two opposing forces pulling the trunks other factors likely participate in their dorsal displacement, but so far these have remained unstudied." But that doesn't end up being what they find. The results from figure 5 and related interpretation on page 17 says "cell-ECM interactions are important for proper trunk morphology, but not for its displacement." So this was confusing to read and I would encourage the authors to frame the issues a bit differently in terms of tube morphogenesis.

      We see how this might be confusing. We will rewrite the introduction so that the work is easier to follow. To achieve this, we will state from the beginning the mechanisms we anticipate that regulate trunk displacement: 1) adhesion to the epidermis, 2) pulling forces from the dorsal branches and 3) a combination of both.

      Some minor presentation issues: What orientation is the cross-sectional view in figure 1C and movie 1?

      We will add a dotted box that indicates the region that we turned 90° to show the cross-section.

      On page 12, the authors say the "Electron micrographs also suggested high filopodial activity" but activity suggests dynamics that are not clear from EM. This could be re-phrased.

      As the reviewer indicates, we cannot conclude dynamics from a static image. We will replace “suggested high filopodial activity” with “revealed filopodial abundance”.

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The results of the paper are significant in that they characterize a mechanical interaction between two tissue types in development, which are linked by the extracellular matrix that sits between them. It is not clear to me that this describes a "novel mechanism for tissue coordination" as stated in the abstract, but it does characterize this type of interaction in a detailed cellular way.

      • Place the work in the context of the existing literature (provide references, where appropriate). For specialists, the work identifies a novel protruding cell type in the fly embryonic trachea, and provides beautiful and detailed imaging data on tracheal development. The "wavy" trachea phenotype is also uncommon and very interesting, so this result could be linked to the few papers that also describe this phenotype and be built up.

      • State what audience might be interested in and influenced by the reported findings. As it stands, this is most interesting for a specialized audience because it requires some understanding of the development of this system in particular. As it characterizes this to a new level of detail, it could be influential to those in the field. Some addition clarification of the results and re-framing could make the manuscript more clear and interesting for non-specialists.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. I work with Drosophila and have studied embryonic and adult cell types, although not trachea specifically. I am familiar with all the genetic techniques and imaging techniques used here.

    1. Reviewer #3 (Public review):

      In a characteristically bold fashion, Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite an avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves, and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment? But the latter is a painted fragment not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passsageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion not an observation and the relationship between hominins and designs no less so. In fact the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.

      Comments on latest version:

      The authors have not modified their stance or the authority of their arguments since the original paper.

    2. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.

      Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.

      We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.

      Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.

      Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.

      Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches. 

      Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.

      Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks. 

      Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.

      Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work. 

      Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.

      Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.

      We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.

      Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.

      Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially  invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.

      Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1.1. It would be helpful if the authors could discuss whether there is any correlation between cryptic sites and the extent of experimental validation in the Phosphosite database (e.g. those that were only identified in one or a few MS experiments). It is difficult to determine stoichiometry of phosphorylation experimentally, but can any inference be made on the extent of phosphorylation of cryptic sites vs. more conventional sites located in IDRs or on the surface of globular domains?

      We thank the reviewer for this valuable suggestion. To investigate the extent of the experimental validation of phosphosites, we examined the number of supporting studies for each site reported in the PhosphoSitePlus database. Specifically, we summed the values of the LT_LIT (literature-based experiments), MS_LIT (mass spectrometry literature), and MS_CST (Cell Signaling Technology mass spectrometry) fields to count the number of independent studies supporting each phosphorylation site, either cryptic or non-cryptic. To visualize the results, we plotted the number of supporting references vs the relative solvent accessibility (RSA) distribution of phosphosites (Figure R1). The analysis revealed a direct correlation between the RSA of phosphosites and the number of studies supporting their phosphorylation. This observation may arise from an intrinsic difficulty in studying cryptic phosphosites due to their destabilizing effects on native proteins. Notably, no differences were observed in the number of supporting studies within cryptic phosphosites (Figure R1B). We have not mentioned these analyses in the new version of the manuscript. However, we would gladly add it if the editor or the reviewer advises accordingly.

      1.2. The authors note that a larger percentage of tyrosine phosphorylation sites are cryptic compared with serine/threonine sites. I assume that tyrosine itself is more highly enriched in the hydrophobic cores of proteins relative to serine or threonine, due to its bulky hydrophobic side chain. Is the increased proportion of cryptic tyrosine phosphorylation sites more, less, or the same as the proportion of tyrosine in hydrophobic cores relative to serine and threonine?

      We thank the reviewer for this insightful comment. As correctly noted, tyrosine residues tend to be enriched in the hydrophobic cores of proteins, as reflected by their generally lower relative solvent accessibility (RSA) values, regardless of phosphorylation state. This enrichment is likely due to the tyrosine side chain's bulky and partially hydrophobic nature. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with that of the same residues non-phosphorylated in the human proteome (Figure R2). In order to statistically compare the two distributions, we employed the Mann-Whitney test. The large sample size inevitably yields very low p-values, even when the distributions differ mildly (pThr, pSer vs non-p Thr, Ser, p 1.3. Fig. 5D and E: I had some trouble interpreting these figures. Indicating where the native state is in the plots would be helpful (stated in text as lower right, but a rectangle on the plot would make this more obvious). The text discusses three metastable intermediates, but what is the fourth one shown on the figures (well A, close to the native state)? This could be more explicitly explained.

      We added the missing rectangles into the original Fig. 5D and E (see below Figure R3 and R4). The three metastable intermediates discussed in the original text reflect protein conformers in which the cryptic site is exposed to the solvent. Conversely, the fourth state, and the final native state, are conformations in which the site is already partially or fully cryptic. The observation that the masking of cryptic sites coincides with the latest folding steps allows us to hypothesize a mechanism by which cryptic phosphorylation may regulate protein folding. Following the reviewer's suggestion, we now specify more explicitly each conformation in the new version of the legends of the relative figures (text file with track changes, lines 950 and 1017).

      1.4. The fact that phosphomimetic mutations of cyptic sites in SMAD2 and CHK1 lead to lower expression levels and shorter half-lives is not surprising, given the expected disruption of the hydrophobic core by introduction of a charged residue. The results certainly show that if phosphorylated, these sites would decrease expression and half-life. With respect to half-life, however, if the authors are correct and cryptic sites are predominately phosphorylated co-translationally, one would expect that the half-life curves for the wt protein would not be a simple exponential, but would instead reflect two distinct populations: those that are phosphorylated during translation, and are almost immediately degraded, and those that escape phosphorylation and have the same half-life as the non-phosphorylatable mutant. Are the actual experimental results consistent with this two-population model? If not, this would be evidence that some of these cryptic sites can be exposed post-translation, either by thermal fluctuation or biological interactions.

      We thank the reviewer for this insightful point. The readout employed in our study (i.e., western blotting) measures the aggregate signal from the total protein population in the cell culture. It thus reflects average protein levels rather than the dynamics of individual molecules. As such, it is not well-suited to resolving coexisting populations with distinct half-lives. We agree that if phosphorylation of cryptic sites occurs strictly co-translationally, one might expect a biphasic decay curve. However, due to methodological constraints, our assay provides only a single exponential fit to the global turnover kinetics. While we cannot entirely exclude the possibility that cryptic sites may become exposed post-translationally (e.g., due to thermal fluctuations or interactions), our molecular dynamics simulations did not reveal such exposure events within the simulated timescales. Therefore, while the two-population model remains plausible in principle, our results are consistent with a co-translational phosphorylation and degradation model. Forthcoming experiments aimed at characterizing the phosphorylation of ribosome-associated nascent chains in the human proteome may further validate this conclusion.

      1.5. The authors make a point that cryptic phosphosites are more highly conserved than non-cryptic phosphosites, but it is not clear to me whether it is the side chain itself or its ability to be phosphorylated that is conserved. Supplemental Fig. 9, if I am interpreting it correctly, would suggest it is the residue itself and not its phosphorylation that is conserved. If so, wouldn't this suggest that phosphorylation of these cryptic sites is just an inevitable consequence of the conservation of serine, threonine, and tyrosine residues in hydrophobic core regions? If the authors have evidence that argues against this simple hypothesis, they should discuss it (e.g., cryptic phosphosites are more highly conserved in some cases than non-phosphorylated tyrosine, serine, and threonine residues that are not solvent accessible).

      We agree with the reviewer's interpretation. The higher conservation of cryptic phosphosites likely reflects the evolutionary constraint on hydrophobic core residues, which tend to be more conserved due to their role in structural stability. This conservation does not imply phosphorylation at those sites is functionally selected across species. Instead, when such residues are phosphorylated, as we observe in the human proteome, the effect is often destabilizing and associated with protein degradation. Our analysis does not establish that the phosphorylation of cryptic residues is conserved across species, only that the residues themselves are. We appreciate the reviewer's suggestion and now explicitly discuss this point in the revised manuscript to clarify the distinction between residue conservation and phosphorylation conservation (text file with track changes, line 618)

      1.6. Regarding the evolutionary conservation of cryptic sites, have the authors taken into consideration that tyrosine-specific kinases, phosphatases, and reader domains first appeared in the first metazoans, and are for the most part not seen in non-metazoan eukaryotes? I notice some of the proteomes used for the conservation analysis include plants and yeast, which lack most tyrosine phosphorylation.

      We thank the reviewer for this insightful comment. In response to the suggestion, we have recalculated the entropic conservation score by restricting the analysis to metazoan species. This analysis ensures that the evolutionary context more accurately reflects the presence and functional relevance of tyrosine-specific kinases, phosphatases, and reader domains. The comparison between the entropic score distribution calculated by including or not non-metazoan orthologues show statistically significant differences for both serine and threonine, and tyrosine. However, the large sample sizes translate inevitably into statistically significant p-values, even when the differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R1). The coefficient helps assess the size and biological significance of a difference (>0.2 = small effect; >0.5 = medium effect; >0.8 = large effect). The analysis indicates a very modest deviation in entropic scores by including or not non-metazoan orthologues.

      1.7. I find the argument that phosphorylation of exposed core residues is part of normal protein quality control/proteostasis to be convincing. Can the authors provide any experimental evidence to support this model (for example, greater phosphorylation of cryptic sites under stress conditions)? I don't think these experiments are necessary, but would seem to be a logical next step and could be done quite easily through collaboration.

      We appreciate the reviewer's suggestion and fully agree that showing more significant phosphorylation of cryptic sites under stress conditions could represent an exciting future direction. We are conducting experiments on individual tumor suppressors such as p53 and PTEN, which harbor cryptic phosphosites, to test whether cellular stress conditions enhance phosphorylation at these positions. These studies assess whether such modifications contribute to altered protein stability or function in stress or disease contexts, particularly cancer. We plan to communicate these results in forthcoming publications and are currently open to collaborations to broaden this line of investigation.

      1.8. The authors note at the end of the discussion that targeting cryptic phosphosites might be a strategy to selectively degrade some proteins in cancer. Practically, how would this work? I can't think of how, but perhaps the authors can provide more specific suggestions.

      We thank the reviewer for raising this important point. One promising approach to therapeutically exploit cryptic phosphosites builds on the PPI-FIT principles (Pharmacological Protein Inactivation by Folding Intermediate Targeting). This strategy targets transient structural pockets appearing only in folding intermediates (Spagnolli et al., Comm Biology 2021). In this context, kinases that phosphorylate cryptic sites could be modulated, either inhibited or redirected, so that misfolded or oncogenic proteins are selectively marked for degradation. For example, selectively enhancing the phosphorylation of a cryptic site on an oncogenic protein could destabilize it and promote its degradation via the proteasome. Conversely, preventing phosphorylation at a cryptic site on a tumor suppressor (e.g., by inhibiting the specific kinase) could enhance protein stability and restore function. While this concept is still emerging, it offers an exciting therapeutic avenue that complements our findings. We added a paragraph addressing this point in the discussion section of the new version of the manuscript (text file with track changes, line 716).

      1.9. Introduction: "It involves the addition of a phosphate to an hydroxyl group found in the side chain of specific amino acids, typically serine, threonine or tyrosine residues." Of course serine, threonine, and tyrosine are the only standard amino acids with a simple hydroxyl group, so "typically" is not needed here.

      We have removed the word "typically" to reflect the accurate chemical specificity of phosphorylation events (text file with track changes, line 82).

      1.10. In my view this is an important study, bringing rigor and a broad proteomic perspective to a phenomenon that (to my knowledge) had not been carefully examined previously. In terms of the big picture, I am of two minds. On the one hand, showing that phosphorylation of hydrophobic core residues exposed during translation or the early stages of folding can regulate steady state levels of some proteins provides an intriguing new mechanism to control the complement of proteins in the cell, and is potentially an area of regulation in normal physiology or in disease. On the other hand, if this is just part of the normal proteostatic mechanisms (hydrophobic core residues exposed for too long consign the protein to degradation, before it can lead to aggregation and other problems), that is a little less interesting to me. I think future work to tease out whether this mechanism is actually regulated and used by the cell to transmit information will be key. But the first step is showing that the phenomenon is real and widespread, and in my view this preprint accomplishes that goal very well.

      We appreciate the reviewer's thoughtful summary and agree that distinguishing between passive proteostatic clearance and active regulatory function is essential. Toward this goal, we plan to carry out a phosphoproteomic analysis of ribosome-associated nascent chains. By mapping phosphorylation events during translation, we aim to validate our cryptic phosphosite dataset in a co-translational context and potentially identify novel regulatory modifications. This approach will also help us assess whether phosphorylation at cryptic sites is modulated context-dependently, thereby supporting a role in regulated protein expression rather than solely quality control.

      2.1. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified.

      We thank the reviewer for raising this critical point. Due to the large sample sizes in our analysis, statistical tests inevitably yield very low p-values, even when differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R2). The comparison between cryptic and non-cryptic phosphosites yielded an effect size (Cohen's d = 0.4028) slightly lower than the one obtained for residues lying within protein cores or exposed on protein surfaces (Cohen's d = 0.5126), both indicating a modest but meaningful shift in entropic scores. In contrast, the comparisons between cryptic phosphosites and all core residues, as well as non-cryptic phosphosites and all surface residues, showed negligible effect sizes (Cohen's d = 0.0245 and 0.1326, respectively). These findings suggest that while statistical significance is achieved in all cases, only the difference between cryptic and non-cryptic phosphosites, or core and surface residues, reflects a meaningful biological signal. We have now included these data in the new version of the manuscript (text file with track changes, line 544).

      2.2. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified ____clusters, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      We fully agree with the reviewer that PTM distribution is non-random and influenced by structural and functional constraints, making comparative analyses challenging. To ensure rigor, we implemented a robust computational pipeline. Unlike other PTMs found almost exclusively on solvent-exposed residues, phosphorylation uniquely showed a distinct subset of sites with extremely low solvent accessibility. This pattern held even after applying stringent structural and dynamical filters. Specifically, we excluded low-confidence residues, small or unstructured domains, and sites that become exposed due to thermal fluctuations, using the SPECTRUS-based dynamic analysis. While we cannot entirely rule out context-specific exposure in fully folded proteins (e.g., during protein-protein interactions), we validated selected cryptic sites experimentally, and our findings were consistent with the computational predictions. We believe this multilayered approach strengthens the reliability of our classification and distinguishes cryptic phosphosites from the broader PTM landscape.

      2.3. Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.

      We calculated the RSA values using the Shrake-Rupley algorithm implemented in the MDTraj Python library. This is a structure-based metric: for each PTM-carrying residue, we evaluated the absolute SASA from the 3D AlphaFold structure and normalized it against the theoretical maximum exposure for that residue in a Gly-X-Gly tripeptide, as defined in Tien et al. (2013). Thus, AlphaFold structures directly provide the atomic coordinates necessary for solvent accessibility estimation. We have now revised the Methods section to describe this process more explicitly (text file with track changes, lines 110 and 113).

      2.4. Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?

      2.5. Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.

      As already discussed in point 1.2 above, and correctly noted also by this reviewer, tyrosine residues are generally enriched in the hydrophobic cores of proteins, which is reflected by their typically low RSA, regardless of phosphorylation status. This tendency likely arises from the bulky and partially hydrophobic nature of the tyrosine side chain. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with those of all these amino acids in the human proteome. We found that phosphorylated residues consistently exhibit higher RSA values than the overall averages for their respective amino acids. This is expected, as phosphorylation within protein cores would likely be destabilizing. Indeed, the existence of low-RSA phosphorylated residues, represents a significant deviation from the intrinsic tendency of tyrosine, serine, and threonine residues and suggests that cryptic sites may become accessible only transiently along protein folding pathways.

      2.6. Figure 3E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?

      We have revised the original figure in the new version of the manuscript to include protein names, and clarified the domain assignments. The cryptic phosphosite T222 in Chk1 lies within a quasi-rigid domain, as identified by SPECTRUS. The color of the image does not reflect any structural property but instead it is used to distinguish different quasi-rigid domains. In particular, black regions identify unstructured domains, whereas shadows from dark grey to white identify quasi rigid domains. We apologize for the lack of clarity. We have corrected the figure legend accordingly (text file with track changes, line 912).

      There is no cutoff in SPECTRUS' identification of quasi-rigid domain. Non quasi-rigid domains are simply regions of the protein that SPECTRUS cannot process properly. Meaning regions that, due to the large degree of intrinsic fluctuations, cannot be modelled as quasi-rigid.

      We also expanded the description of Pidc in the main text to clarify that it quantifies the proportion of intra-domain contacts made by the phosphosite's side chain, and that a cutoff of {greater than or equal to}0.8 was used to retain only residues well-integrated within rigid domains (text file with track changes, line 243).

      We hope these updates will resolve the ambiguities noted and more clearly define the criteria used in our filtering pipeline.

      2.7. The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.

      We want to clarify that our evolutionary analyses compare residues at the aligned positions in orthologous proteins across multiple species. This approach ensures that each cryptic or non-cryptic phosphosites is assessed in its native structural and sequence context. Therefore, the comparison is not between different regions but evaluates the evolutionary conservation of specific sites across species, allowing for a direct and meaningful comparison of cryptic and non-cryptic phosphosites. In order to address the second point, we report below the entropic score distributions for serine/threonine and tyrosine, separately (Figure R5).

      2.8. Have the authors thought of randomization of their data to see whether the distributions are significant?

      We are unsure we fully understand what the referee means by randomizing the data in this case.

      However, according to the mathematical definition of entropic score, the limit case in which, within each orthogroup, the phosphorylated amino acid is replaced by a completely random residue yields an entropic score of 1. The opposite limit, in which all members of the orthogroups have the same amino acid in the position of the phosphorylated amino acid, yields an ES of 0. We have added a paragraph in the methods to stress this point (text file with track changes, line 354).

      2.9. Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R? Thank you for pointing this out. We have now corrected the missing information in the revised version of the manuscript (text file with track changes, from line 992 to 1008)

      2.10. Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      We agree with the reviewer's suggestion. We have removed subjective language such as "impressive" from the revised manuscript to ensure an objective and neutral tone, allowing readers to independently evaluate the significance of our findings (text file with track changes, line 454).

      3.1. Residues with pLDDT scores below 65 were excluded from the analysis. The high-confidence measure applies to individual residues, regardless of whether the domains they belong to are also predicted with high confidence. Identifying the number of domains containing PTMs with overall high-confidence predictions could provide better insights into the orientation of modified residues within domain structures. To assess the relationship between residue-specific confidence and domain stability, we can analyze the correlation between high-confidence modified residues and the overall prediction accuracy of their domains. This could be quantified using the average error scores of domain residues. Additionally, using the average pLDDT score would indicate how many individual residues were predicted with high local structural confidence. In contrast, the average PAE (Predicted Aligned Error) score would provide insights into how well each residue's position is predicted relative to others within the domain, reflecting overall domain structural confidence.

      Our analysis excluded residues with pLDDT scores below 65 to ensure high local confidence. While pLDDT provides residue-level structural confidence, assessing domain-wide prediction quality offers additional insights into modified residues' spatial organization and exposure. However, a domain-level interpretation is currently limited by the format of AlphaFold structural predictions. Specifically, AlphaFold does not provide Predicted Aligned Error (PAE) matrices for sequences split into overlapping fragments, a method used for proteins longer than 2,700 amino acids. These fragment predictions are only available in the downloadable AlphaFold proteome archives, not through the web interface, and lack the global alignment metrics (such as PAE) necessary for analyzing domain stability or inter-residue confidence within the domain context.

      3.2. "Approximately 65% of proteins with cryptic phosphosites contained only one or two such residues, while less than 10% had five or more sites (Supp. Figure 3)." To better interpret this trend, it would be useful to analyze the total number of cryptic PTMs on proteins part of this study, including all modification types-not just phosphorylation. This would help determine whether the observed pattern is specific to phosphorylation or if it extends to other post-translational modifications as well.

      To compare the occurrence of different cryptic PTMs, we extended our analysis to include all cryptic post-translational modifications annotated in PhosphoSitePlus, including phosphorylation, glycosylation, methylation, sumoylation, and ubiquitination. The approach allowed us to assess whether the observed distribution of cryptic phosphosites is unique or represents a more general feature of all cryptic PTMs. We observed extensive variation among the different PTMs in the proportion of proteins carrying 1, 2, or more of the same cryptic PTM (see Table R3). However, it must be noted that the relatively low number of cryptic PTMs, excluding phosphorylation, could make it difficult to determine whether these patterns reflect actual biological trends or are simply influenced by the sample size. We have not included these data in the new version of the manuscript, but we would be willing to add them if the editor or the reviewer advises us accordingly.

      3.3. For the validation of cryptic sites, selecting domains under 200 amino acids was mentioned. However, was there also a minimum length threshold applied, similar to the filtering criteria used for false positives (less than 40 ignored)?

      The 40-residue threshold was applied because protein domains that are too small cannot be reliably subdivided into quasi-rigid domains. Trying to run SPECTRUS on structures with fewer than 40 residues inevitably returns a warning, reflecting the intrinsic cooperative nature of quasi-rigid domains. In fact, entities composed of too few amino acids cannot properly arrange themselves into 3D structures and tend to be disordered. The same reasoning was applied when choosing the proteins to simulate. In particular, for the refolding simulations, we selected protein domains possessing the following properties:

      1. Shorter than 200 amino acids to limit the computational demands.
      2. Long enough to fold into an ordered 3-dimensional conformation reliably.
      3. Have an experimentally determined NMR or X-ray crystal structure 3.4. To test their hypothesis that phosphorylation affects protein expression, they selected candidates for serine and threonine but excluded tyrosine. What were the reasons for not including tyrosine-related PTMs in their analysis?

      Our experimental assays relied on phosphomimetic substitutions to mimic the effect of phosphorylation. While serine/threonine phosphorylation can be reasonably mimicked by E or D substitutions, there is no reliable single-residue mimic for phosphotyrosine. Indeed, E or D substitutions do not recapitulate the structural or electronic features of pTyr. Given these limitations, we excluded tyrosine phosphosites from experimental validation to avoid generating inconclusive or misleading data.

      3.5. Do we know that the regulatory role of S300 on PYST1 is associated with the dual specificity of the phosphatase, and is this why it was selected as a negative regulator? While the regulatory roles of the other analyzed phosphosites on SMAD and CHK1 are discussed, there is limited mention of the specific role of S300 on PYST1 within the scope of the study.

      S300 of PYST1 was selected not due to known regulatory relevance, but for technical convenience. PYST1 is a relatively small protein, facilitating computational simulations. We also had suitable reagents for detection (i.e., expression vector), and importantly, S300 was identified as a false-positive cryptic phosphosite removed by our dynamic filtering. It was a practical and structurally matched negative control for validating our computational pipeline.

      3.6. When comparing the entropic scores between cryptic and non-cryptic residues, the medians are 0.43 and 0.52, respectively. Although this difference is not very high, they do observe that cryptic residues have lower scores than non-cryptic ones. The distributions also show greater overlap (Figure 6). I'm wondering if any statistical testing would help assess how distinct these two groups really are.

      We thank the reviewer for the comment raised by reviewer #2, for which we provide an answer above. Briefly, given our large sample sizes, statistical tests often yield very low p-values even for minor differences. To assess the biological significance, we calculated Cohen's d (Table R2 above). The effect size between cryptic and non-cryptic phosphosites (d = 0.4028) was modest but meaningful, and slightly lower than between core and surface residues (d = 0.5126).

      3.7. Why did the authors choose to rely on AlphaFold data instead of examining PDB structures? I didn't see any explanation or rationale provided for preferring AlphaFold predictions over experimentally determined structures from the PDB.

      We appreciate the value of this comment. We focused on AlphaFold to maximize proteome-wide coverage. Indeed, although PDB structures offer experimentally validated conformations, their sparse and uneven proteome coverage (particularly for membrane proteins, low-abundance factors, and intrinsically disordered regions) precludes a truly global analysis. AlphaFold2 models, by contrast, deliver accurate, full-length structures for nearly the entire human proteome, enabling unbiased, large-scale mapping of cryptic phosphosites. Nonetheless, we performed the same analysis using high-resolution structures from the Protein Data Bank (PDB). The results were fully consistent with those based on AlphaFold predictions, indicating that our findings are consistent across the two databases (see Figure R6 below).

      3.8. Novelty - The concept that cryptic site modifications can dysregulate signaling in cancer and other diseases is known, but systematically categorizing PTM sites into cryptic and non-cryptic to generate hypotheses for a wide range of identified PTMs remains an underdeveloped approach. This study establishes a framework for classifying PTMs based on their structural accessibility, integrating AlphaFold predictions, molecular dynamics simulations, solvent accessibility analysis, and phylogenetic conservation metrics. This approach not only enhances our understanding of PTM-mediated regulatory mechanisms but also provides a foundation for exploring how cryptic modifications contribute to protein function, stability, and disease progression.

      We appreciate the reviewer's comment. To our knowledge, this is the first study to introduce and define "cryptic phosphosites" as a structurally distinct and functionally relevant subset of phosphorylation sites. While some individual cases of buried amino acids influencing cancer-related proteins have been reported, no previous study has systematically mapped, filtered, and analyzed these sites across the human proteome using integrated structural, dynamical, evolutionary, and experimental criteria.

      3.9. The study relies primarily on predicted protein structures (e.g., AlphaFold), without exploring experimentally derived structures, which could provide more accurate and physiologically relevant insights.

      We have addressed this point above (see reply to #3.7).

      3.10. While the research demonstrates the impact of cryptic PTMs on protein function, it would be valuable to also investigate non-cryptic sites from their annotated data. By examining the effects of modifications on these non-cryptic sites, the study could further validate the importance of the cryptic versus non-cryptic classifications and help clarify the functional relevance of both types of sites.

      We thank the referee for this thoughtful suggestion. We compared the proportion of cryptic or non-cryptic phosphosites associated with cancer- and disease-related mutations in each group from the COSMIC and PTMVar datasets. The percentage of phosphosites associated with the two repositories is essentially the same for cryptic and non-cryptic sites. This observation suggests that, despite their different structural and regulatory features, both site types occur similarly in disease contexts (see Table R4). We have included these data in the new version of the manuscript (text file with track changes, line 1067; and new Supp. Table 3).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Review on Gasparotto et al "Mapping Cryptic Phosphorylation Sites in the Human Proteome"

      Gasparotte et al assess the solvent accessibility of 87,138 post-translationally modified amino acids in the human proteome (from phosphosite plus). There initial observation is that a large fraction of modified sites are buried, a finding that is pronounced for phosphorylation but not other modifications. Their approach is using alpha fold 3D structures (0.65 cut off) and RSA prediction to get a set of buried sites. Further refinement includes the removing of low-confidence segments (such as loops, linkers, or short disordered regions) and to use SPECTRUS to identified quasi-rigid domains. The idea is that quasi rigid domains may not breathe and thus will be modified during the synthesis or folding.

      They generated a final dataset of 10,606 cryptic T, S and Y phosphor-sites in 5,496 proteins and state that: "These data indicate that ~5% of all known phospho-sites are cryptic. Impressively, the number translates to ~33% of phosphorylated proteins in the human proteome presenting at least one cryptic phospho-site." They focus on S417 of the SMAD2, T382 of Chk1, known to be associated with loss of function effects or proteasomal degradation and S300 of PYST1 negative control. They stably express these proteins as phospho-mimicry or alanine substitution in HEK293. Expression levels were reduced in the phosphor-D- mutant versions and upon cycloheximide treatment a reduction of the turnover time for the phospho-D CHK1 was observed. I think we are looking a large clonal difference in the supplemental figures.

      The examples are supported by MD simulations that suggest that cryptic phospho-sites can occur during the folding process and affect protein homeostasis by drastically increasing degradation rate and leading to rapid turnover; Essentially the phospho-versions show a solvent exposure. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified. Finally, overlay of the sites with cancer mutations lists 221 mutations in COSMIC associated with cryptic phosphosites that have been annotated as cancer-related and 138 mutations in PTMVar linked to cancer and other human pathologies. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      Points for consideration

      • Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.
      • Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?
      • Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.
      • Figure 3 E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?
      • The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.
      • Have the authors thought of randomization of their data to see whether the distributions are significant?
      • Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R?
      • Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      Significance

      The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      main conclusion: 5% of all known phospho-sites are cryptic, at least one in 1/3 of structured protein regions.

    1. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that go against certain mainstream ideas in psychedelic neuroscience (that I largely agree with). I cannot speak to the math in this manuscript, but it seems like quite a conceptual leap to set a parameter of the model in between wake and sleep and state that this is a proxy to acute psychedelic effects (point #20). My other concerns below are related to the review of the psychedelic literature:

      (1) Page 1, Introduction, "...they are agonists for the 5-HT2a serotonin receptor commonly expressed on the apical dendrites of cortical pyramidal neurons..." It is a bit redundant to say "5-HT2A serotonin receptor," as serotonin is already captured by its abbreviation (i.e., 5-HT).

      While psychedelic research has focused on 5-HT2A expression on cortical pyramidal cells, note that the 5-HT2A receptor is also expressed on interneurons in the medial temporal lobe (entorhinal cortex, hippocampus, and amygdala) with some estimates being >50% of these neurons (https://doi.org/10.1016/j.brainresbull.2011.11.006, https://doi.org/10.1007/s00221-013-3512-6, https://doi.org/10.7554/eLife.66960, https://doi.org/10.1016/j.mcn.2008.07.005, https://doi.org/10.1038/npp.2008.71, https://doi.org/10.1038/s41386-023-01744-8, https://doi.org/10.1016/j.brainres.2004.03.016, https://doi.org/10.1016/S0022-3565(24)37472-5, https://doi.org/10.1002/hipo.22611, https://doi.org/10.1016/j.neuron.2024.08.016). However, with ~1:4 ratio of inhibitory to excitatory neurons in the brain (https://doi.org/10.1101/2024.09.24.614724), this can make it seem as if 5-HT2A expression is negligible in the MTL. I think it might be important to mention these receptors, as this manuscript discusses replay.

      I see now that Figure 1 mentions that PV cells also express 5-HT2A receptors. This should probably be mentioned earlier.

      (2) Page 1, Introduction, "They have further been used for millennia as medicine and in religious rituals..." This might be a romanticization of psychedelics and indigenous groups, as anthropological evidence suggests that intentional psychedelic use might actually be more recent (see work by Manvir Singh and Andy Letcher).

      (3) When discussing oneirogens, it could be worth differentiating psychedelics from kappa opioid agonists such as ibogaine and salvinorin A, another class of hallucinogens that some refer to as "oneirogens" (similar to how "psychedelic" is the colloquial term for 5-HT2A agonists). Note that studies have found the effects of Salvia divinorum (which contains salvinorin A) to be described more similarly to dreams than psychedelics (https://doi.org/10.1007/s00213-011-2470-6). This makes me wonder why the present study is more applicable to 5-HT2A psychedelics than other kappa opioid agonists or other classes of hallucinogens (e.g., NMDA antagonists, muscarinic antagonists, GABAA agonists).

      (4) Page 2, Introduction, "Replay sequences have been shown to be important for learning during sleep [14, 15, 16, 17, 18]: we propose that mechanisms supporting replay-dependent learning during sleep are key to explaining the increases in plasticity caused by psychedelic drug administration." I'm not sure I follow the logic of this point. Dreams happen during REM sleep, whereas replay is most prominent during non-REM sleep. Moreover, while it's not clear what psychedelics do to hippocampal function, most evidence would suggest they impair it. As mentioned, most 5-HT2A receptors in the hippocampus seem to be on inhibitory neurons, and human and animal work finds that psychedelics impair hippocampal-dependent memory encoding (https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455, https://doi.org/10.3389/fnbeh.2014.00180, https://doi.org/10.1002/hipo.22712). One study even found that psilocin impairs hippocampal-dependent memory retrieval (https://doi.org/10.3389/fnbeh.2014.00180). Note that this is all in reference to the acute effects (psychedelics may post-acutely enhance hippocampal-dependent memory, https://doi.org/10.1007/s40265-024-02106-4).

      (5) Page 2, Introduction, "In total, our model of the functional effect of psychedelics on pyramidal neurons could provide a explanation for the perceptual psychedelic experience in terms of learning mechanisms for consolidation during sleep..." In contrast to my previous point, I think this could be possible. Three datasets have found that psychedelics may enhance cortical-dependent memory encoding (i.e., familiarity; https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455), and two studies found that post-encoding administration of psychedelics retroactively enhanced memory that may be less hippocampal-dependent/more cortical-dependent (https://doi.org/10.1016/j.neuropharm.2012.06.007, https://doi.org/10.1016/j.euroneuro.2022.01.114). Moreover, and as mentioned below, 5 studies have found decoupling between the hippocampus and the cortex (https://doi.org/10.3389/fnhum.2014.00020, https://doi.org/10.1002/hbm.22833, https://doi.org/10.1016/j.celrep.2021.109714, https://doi.org/10.1162/netn_a_00349, https://doi.org/10.1038/s41586-024-07624-5), something potentially also observed during REM sleep that is thought to support consolidation (https://doi.org/10.1073/pnas.2123432119). These findings should probably be discussed.

      (6) Page 2, Introduction, "In this work, we show that within a neural network trained via Wake-Sleep, it is possible to model the action of classical psychedelics (i.e. 5-HT2a receptor agonism)..." Note that 5-HT2A agonism alone is not sufficient to explain the effects of psychedelics, given that there are 5-HT2A agonists that are non-hallucinogenic (e.g., lisuride).

      (7) Page 2, Introduction, "...by shifting the balance during the wake state from the bottom-up pathways to the top-down pathways, thereby making the 'wake' network states more 'dream-like'." I could have included this in the previous point, but I felt that this idea deserved its own point. There has been a rather dogmatic assertion that psychedelics diminish top-down processing and/or enhance bottom-up processing, and I appreciate that the authors have not accepted this as fact. However, because this is an unfortunately prominent idea, I think it ought to be fleshed out more by first mentioning that it's one of the tenets of REBUS. REBUS has become a popular model of psychedelic drug action, but it's largely unfalsifiable (it's based on two unfalsifiable models, predictive processing and integrated information theory), so the findings from this study could tighten it up a bit. Second, there have now been a handful of studies that have attempted to study directionality in information flow under psychedelics, and the findings are rather mixed including increased bottom-up/decreased top-down effects (https://doi.org/10.7554/eLife.59784, https://doi.org/10.1073/pnas.1815129116; note that the latter "bottom-up" effect involves subcortical-cortical connections in which it's less clear what's actually "higher-/lower-level"), increased top-down/decreased bottom-up effects (https://doi.org/10.1038/s41380-024-02632-3, https://doi.org/10.1016/j.euroneuro.2016.03.018), or both (https://doi.org/10.1016/j.neuroimage.2019.116462, https://doi.org/10.1016/j.neuropharm.2017.10.039), though most of these studies are aggregating across largely inhomogeneous states (i.e., resting-state). Lastly, and somewhat problematically, facilitated top-down processing is also an idea proposed in psychosis that's based partially on findings with acute ketamine administration (note that all hallucinations to some degree might rely on top-down facilitation, as a hallucination involves a high-level concept that impinges on lower-level sensory areas; see work by Phil Corlett). While psychosis and the effects of ketamine have some similarities with psychedelics, there are certainly differences, and I think the goal of this manuscript is to uniquely describe 5-HT2A psychedelics (again, I'm left wondering why tweaking alpha in the Wake-Sleep algorithm is any more applicable to psychedelics than other hallucinogenic conditions).

      (8) Figure 2 equates alpha with a "psychedelic dose," but this is a bit misleading, as neither the algorithm nor an individual was administered a psychedelic. Alpha is instead a hypothetical proxy for a psychedelic dose. Moreover, if the model were recapitulating the effects of psychedelics, shouldn't these images look more psychedelic as alpha increases (e.g., they may look like images put through the DeepDream algorithm).

      (9) Page 11, Methods, "...and the gate α ensures that learning only occurs during sleep mode... The (1 − α) gate in this case ensures that plasticity only occurs during the Wake mode." Much of the math escapes me, so perhaps I'm misunderstanding these statements, but learning and plasticity certainly happen during both wake and sleep, making me wonder what is meant by these statements. Moreover, if plasticity is simply neural changes, couldn't plasticity be synonymous with neural learning? Perhaps plasticity and learning are meant to refer to different types of neural changes. It might be worth clarifying this, as a general problem in psychedelic research is that psychedelics are described as facilitating plasticity when brains are changing at every moment (hence not experiencing every moment as the same), and psychedelics don't impact all forms of plasticity equally. For example, psychedelics may not necessarily enhance neurogenesis or the addition of certain receptor types, and they impair certain forms of learning (i.e., episodic memory encoding). What is typically meant by plasticity enhancements induced by psychedelics (and where there's the most evidence) is dendritic plasticity (i.e., the growth of dendrites and spines). Whatever is meant by "plasticity" should be clarified in its first instance in this manuscript.

      (10) Page 12, Methods, "During training, neural network activity is either dominated entirely by bottom-up inputs (Wake, α = 0) or by top-down inputs (Sleep, α = 1)." Again, I could be misunderstanding the mathematical formulation, but top-down inputs operate during wake, and bottom-up inputs can operate during sleep (people can wake up or even incorporate noise from their environments into sleep.

      (11) Page 4, Results, "Thus, we can capture the core idea behind the oneirogen hypothesis using the Wake-Sleep algorithm, by postulating that the bottom-up basal synapses are predominantly driving neural activity during the Wake phase (when α is low)." However, several pieces of evidence (and the first circuit model of psychedelic drug action) suggest that psychedelics enhance functional connectivity and potentially even effective connectivity from the thalamus to the cortex (https://doi.org/10.1093/brain/awab406). Note that psychedelics may not equally impact all subcortical structures. REBUS proposes the opposite of the current study, that psychedelics facilitate bottom-up information flow, with one of the few explicit predictions being that psychedelics should facilitate information flow from the hippocampus to the default mode network. However, as mentioned earlier, 5 studies have found that psychedelics diminish functional connectivity between the hippocampus and cortex (including the DMN but also V1).

      (12) Page 4, Results, "...and have an excitatory effect that positively modulates glutamatergic transmission..." Note that this may not be brainwide. While psychedelics were found to increase glutamatergic transmission in the cortex, they were also found to decrease hippocampal glutamate (consistent with inhibition of the hippocampus, https://doi.org/10.1038/s41386-020-0718-8).

      (13) Page 5, "...which are similar to the 'breathing' and 'rippling' phenomena reported by psychedelic drug users at low doses..." Although it's sometimes unclear what is meant by "low doses," the breathing/rippling effect of psychedelics occurs at moderate and high doses as well.

      (14) I watched the videos, and it's hard for me to say there was some stark resemblance to psychedelic imagery. In contrast, for example, when the DeepDream algorithm came out, it did seem to capture something quite psychedelic.

      (15) Page 5, "This form of strongly correlated tuning has been observed in both cortex and the hippocampus." If this has been observed under non-psychedelic conditions, what does this tell us about this supposed model of psychedelics?

      (16) Page 6, with regards to neural variability, "...but whether this phenomenon [increased variability] is general across tasks and cortical areas remains to be seen." First, is variability here measured as variance? In fMRI datasets that have been used to support the Entropic Brain Hypothesis, note that variance tends to decrease, though certain measures of entropy increase (e.g., Figure 4A here https://doi.org/10.1073/pnas.1518377113 shows global variance decreases, and this reanalysis of those data https://doi.org/10.1002/hbm.23234 finds some entropy increases). Thus, variance and entropy should not be confused (in theory, one could cycle through several more brain states that are however, similar to each other, which would produce more entropy with decreased variance). Second, and perhaps more problematically for the EBH, is that the entropy effects of psychedelics completely disappear when one does a task, and unfortunately, the authors of these findings have misinterpreted them. What they'll say is that engaging in boring cognitive tasks or watching a video decreases entropy under psychedelics, but what you can see in Figure 1b of https://doi.org/10.1021/acschemneuro.3c00289 and Figure 4b of https://doi.org/10.1038/s41586-024-07624-5 is that entropy actually increases under sober conditions when you do a task. That is, it's a rather boring finding. Essentially, when resting in a scanner while sober, many may actually rest (including falling asleep, especially when subjects are asked to keep their eyes closed), and if you perform a task, brain activity should become more complex relative to doing nothing/falling asleep. When under a psychedelic, one can't fall asleep and thus, there's less change (though note that both of the above studies found numerical increases when performing tasks). Lastly, again I should note that the findings of the present study actually go against EBH/REBUS, given that the findings are increased top-down effects when EBH/REBUS predicts decreased top-down/increased bottom-up effects.

      (17) Page 6, "Because psychedelic drug administration increases influence of apical dendritic inputs on neural activity in our model, we found that silencing apical dendritic activity reduced across stimulus neural variability more as the psychedelic drug dose increases." I again want to point out that alpha is not the equivalent of a psychedelic dose here, but rather a parameter in the model that is being proposed as a proxy.

      (18) Page 8, "Experimentally, plasticity dynamics which could, theoretically, minimize such a prediction error have been observed in cortex [66, 67], and it has also been proposed that behavioral timescale plasticity in the hippocampus could subserve a similar function [68]. We found that plasticity rules of this kind induce strong correlations between inputs to the apical and basal dendritic compartments of pyramidal neurons, which have been observed in the hippocampus and cortex [55, 56]." Note that the plasticity effects of psychedelics are sometimes not observed in the hippocampus or are even observed as decreases (reviewed in https://doi.org/10.1038/s41386-022-01389-z).

      (19) Page 9, as is mentioned, REBUS proposes that there should be a decrease in top-down effects under psychedelics, which goes against what is found here, but as I describe above, the effects of psychedelics on various measures of directionality have been quite mixed.

      (20) Unless I'm misunderstanding something, it seems to be a bit of a jump to infer that simply changing alpha in your model is akin to psychedelic dosing. Perhaps if the model implemented biologically plausible 5-HT2A expression and/or its behavior were constrained by common features of a psychedelic experience (e.g., fractal-like visuals imposed onto perception, inability to fall asleep, etc.), I'd be more inclined to see the parallels between alpha and psychedelics dosing. However, it would still need to recapitulate unique effects of psychedelics (e.g., impairments in hippocampal-dependent memory with sparing/facilitation of cortical memory). At the moment, it seems like whatever the model is doing is applicable to any hallucinogenic drug or even psychosis.

    2. Author response:

      We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.

      Common Concerns (Reviewer 1 & Reviewer 2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.

    1. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to model-predicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1) Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by Domínguez-Borràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.

      Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.

      Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.

      Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.

      We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.

      Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:

      (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)

      (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions

      (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R<sup>2</sup> = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R<sup>2</sup> = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.

      Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R<sup>2</sup> = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R<sup>2</sup> = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R<sup>2</sup> and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.

      These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).

      We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.

      Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?

      We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.

      For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K<sup>+</sup> channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.

      For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.

      It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.

      With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.

      To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel Na<sub>V</sub> 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel Na<sub>V</sub>1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a Na<sub>V</sub>1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate Na<sub>V</sub>1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.

      (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.

      We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB

      5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.

      (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).

      We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”

      (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.

      We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.

      Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.

      We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, Na<sub>V</sub>1.5, in Figure S14.

      Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.

      While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.

      Minor concerns:

      (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.

      The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”

      (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.

      We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how

      AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.

      (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.

      The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC<sub>50</sub> values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.

      As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.

      (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.

      We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”

      (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).

      a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.

      We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).

      Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”

      b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K<sup>+</sup> channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.

      We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.

      The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.

      Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.

      (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.

      We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”

      Reviewer #3 (Recommendations for the authors):

      The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).

      We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.

      Astemizole (Figure S13a):

      - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.

      - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.

      - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.

      - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).

      - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.

      E-4031 (Figure S13b):

      - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.

      - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.

      - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.

      - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).

      - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.

      In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).

      The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).

      In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.

      Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.

      We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K<sup>+</sup> conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.

      Page 8, the significance of 750 and 500 mV in terms of physiological role?

      We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K<sup>+</sup> conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K<sup>+</sup> ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K<sup>+</sup> channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K<sup>+</sup> channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.

      The abstract could be edited a bit to more clearly state the novel findings in this study.

      We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is K<sub>V</sub>11.1 (hERG), comprising the primary cardiac repolarizing current, I<sub>kr</sub>. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”

      Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.

      Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.

      Additionally, we revised the Discussion section to improve focus and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.

      Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.

      Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Thank you for your thoughtful review and for highlighting the importance of our approach.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.

      Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.

      We are grateful for your high evaluation of our work.

      The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:

      (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.

      We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.

      (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.

      We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.

      (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.

      We provided the PAE plot in the revised Figure S1C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.

      We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.

      (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.

      ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.

      (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.

      Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.

      We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.

      (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.

      Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.

      Reviewer #2 (Recommendations for the authors):

      I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?

      We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.

      As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?

      Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.

      We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.

      Reviewer #3 (Recommendations for the authors):

      Here are some additional minor suggestions:

      (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.

      We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.

      (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.

      We have added a proposed diagram as Figure 1A.

      (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.

      We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).

      (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".

      We have revised the labeling in Figure 1D.

      (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.

      We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.

      (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".

      We have changed “wide” to “extensive” in the revised manuscript.

      (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".

      We simply used "tandem" in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      We sincerely thank all three reviewers for their thoughtful and constructive feedback. Your comments were invaluable in improving the clarity and quality of our work.

      In this study, we revisit a previously overlooked lipophilic dye, demonstrating its utility for live-cell imaging that transport in a non-vesicular pathway and label autophagy related structures. Against the backdrop of increasing attention to membrane contact sites (MCSs), bridge-like lipid transfer proteins (BLTPs), and organelle biogenesis, we aim to propose the possibility of a reversible one-way phospholipid transfer activity that really takes place in living cells.

      As Reviewer #1 noted, recent cryo-EM studies (e.g., Oikawa et al.) have highlighted the importance of lipids in autophagosome formation. And there are some existed in vitro studies. However, we believe that we have to think about the consistence of simplified in vitro reconstitution and the complex real cellular environment. In addition, to our knowledge, no studies have directly tracked lipid flow dynamics over time in living cells. We believe our work contributes to this gap by combining three interesting technical approaches: (a) R18 as a lipid-tracing dye, (b) FRAP analysis on the isolation membrane, and (c) the use of Ape1 overexpression to stall autophagosome closure, enabling us to visualize reversible lipid flow in vivo. While these techniques may not appear "fancy," we hope they offer new insights that can inspire further exploration in lipid dynamics story in a real cellular environment.

      We appreciate Reviewer #2's comments on our high imaging quality and Reviewer #3's recognition of our approach as an elegant way to study lipid transfer. We have revised the manuscript accordingly and included additional explanations, figure clarifications, and planned experiments to address remaining concerns.

      As two key concerns were raised repeatedly by all reviewers, we would like to address them here:

      1. Regarding the concern that the evidence for reversible lipid transfer from the IM to the ER is not sufficiently strong:

      We are deeply grateful to Reviewer #2 for the insightful suggestion to compare the fluorescence recovery of the adjacent bleached ER to that of the ER-IM MCS, to exclude the possibility that recovery at the ER-IM MCS originates from nearby ER rather than from the IM. Following this suggestion, we performed a quantitative analysis using unbleached ER as a background. Interestingly, in every sample, the adjacent bleached ER consistently showed a significantly lower fluorescence recovery than the ER-IM MCS. We also used the IM as a background for normalization, the difference became even more pronounced, further supporting the idea that the adjacent ER could not be the source of the recovery signal at the ER-IM MCS. These findings strengthen our conclusion that phospholipid recovery at the MCS could be derived from the IM. The updated analysis and corresponding figure panels (Figure 5K, 5L, and 5M), along with the relevant text (lines 384-396), have been revised accordingly.

      Regarding the concern that the evidence for R18 transfer via Atg2 as a bridge-like lipid transfer protein is not sufficiently direct:

      In addition to the evidence presented in this manuscript, we have now cited our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), where we provide direct evidence that Atg2 indeed functions as a bridge-like lipid transfer protein, rather than a shuttle. Importantly, we also show in that study that R18 transfer requires the bridge-like structure of Atg2. This new reference has been cited in the revised manuscript, and relevant textual explanations have been added to provide further support.

      We hope that the revisions and our revision plan can address the reviewers key concerns. Please find our detailed point-by-point responses below.

      Response to the Reviewer ____#____1

      In their study, Hao and colleagues exploited the fluorescent fatty acid R18 to follow phospholipid (PL) transfer in vivo from the endoplasmic reticulum to the IM during autophagosome formation. Although the results are interesting, especially the retrograde transport of PLs, based on the provided data, additional control experiments are needed to firmly support the conclusions.

      We sincerely thank the reviewer for the positive assessment and agree that additional controls are necessary to support our conclusion. Detailed responses and corresponding revisions are provided below.

      An additional point is that the authors also study the internalization of R18 into cells and found a role of lipid flippases and oxysterol binding proteins. While this information could be useful for researchers using this dye, these analyses/findings have no specific connection with the topic of the manuscript, i.e. the PL transfer during autophagosome formation. Therefore, they must be removed.

      We thank the reviewer for the thoughtful comment. We understand the concern that the R18 internalization analysis may appear peripheral to the manuscript's main focus on phospholipid transfer during autophagosome formation. However, we respectfully believe that this section is critical for establishing the mechanistic basis as this study represents the first detailed in vivo application of R18 for tracing lipid dynamics. We believe it is interesting that R18 entry is not due to chemically passive diffusion or non-specific adsorption, but occurs through a biologically regulated, non-vesicular lipid transport pathway. This mechanistic context underpins the reliability of using R18 to monitor ER-to-IM lipid transport in the autophagy pathway.

      To improve clarity and coherence, we have added explanatory text in the Introduction and at the start of the Results section to explicitly link the internalization assay to the subsequent autophagy-related experiments (line 94-98, 185-187). We hope this helps guide the reader through the rationale and relevance of this part of the study.

      Major points:

      1) In general, the quality of the microscopy images are quite poor and this make it difficult to assert some of the authors' conclusions.

      We thank the reviewer for the feedback. To better address this concern, we would appreciate clarification regarding which specific images or figure panels were found to be of low quality. Overall, we believe the microscopy data presented are of sufficient resolution and clarity to support our main conclusions, as also noted by Reviewer #2 ("the high-quality images and FRAP experiments").

      We acknowledge that certain phenomena-such as occasional R18 labeling of the vacuole-were not clearly explained in the original manuscript. We have now included additional clarification in the results section and mentioned this limitation in the discussion (lines 170-171, 436-438), along with a note on ongoing experiments to further investigate this point.

      2) It would be important to perform some lipidomics analysis to determine in which PLs and other lipids or lipid intermediates R18 is incorporated. First, it will be important to know which the major PL species are are labelled under the conditions of the experiments done in this study. Second, the authors assume that all the R18 is exclusively incorporated into PLs and this is what they follow in their in vivo experiments. What about acyl-CoA, which has been shown to be a key player in the IM elongation (Graef lab, Cell)?

      We thank the reviewer for raising this point. However, we believe this is based on a misunderstanding of the chemical nature of R18. R18 is not a free fatty acid analog and cannot be incorporated into phospholipids or acyl-CoA via metabolic pathways. Due to its chemical structure-a bulky rhodamine headgroup attached to a long alkyl chain-it cannot undergo enzymatic conjugation or incorporation into membrane lipids. This is why we did not pursue lipidomics analysis. Instead, we focused on characterizing the biological behavior of R18 through a range of live-cell assays, including temperature and ATP dependency, involvement of flippases, OSBP proteins, and Atg2, all of which support a regulated, non-vesicular lipid transport pathway. Additionally, the AF3 structural model presented in this study is consistent with this interpretation, showing no evidence of R18 forming chemical bonds with phospholipids.

      3) Figure 1A and 1B. The authors conclude that Atg2 is involved in the lipid transfer since R18 does not localize to the PAS/ARS in the atg2KO cells. However, another possible explanation is that in those cells the IM is not formed and does not expand, and con sequetly R18 is present in low amounts not detectable by fluorescence microscopy. To support their conclusion, the authors must assess PAS-labelling with R18 in cells lacking another ATG gene in which Atg2 is still recruited to the PAS.

      We thank the reviewer for this important suggestion. As noted, the absence of R18 at the PAS in atg2Δ cells may reflect a lack of membrane formation rather than impaired lipid transfer. However, in support of our interpretation, our previous work (Hirata E, Ohya Y, Suzuki K, 2017) has shown that R18 accumulates at PAS-like structures in delipidation mutants, where the IM fails to expand but Atg2 is still recruited (please refer to the attached revision plan for further details). This suggests that the presence of Atg2, rather than the mere existence of a mature IM, contributes to R18 localization.

      To address this, we revised our statement to the more cautious: "R18 was undetectable at the PAS in atg2Δ cells," to avoid overinterpretation (lines 119-120). 4)

      4) Figure 2. As written, the paragraph this figure seems to indicate that flippases are directly involved in the translocation of R18 from the PM to the ER. As correctly indicated by the authors, flippases flip PLs, not fatty acids. Moreover, there are no PL synthesizing at the PM and thus probably R18 is not flipped upon incorporation into PL. As a result, the relevance of flippase in R18 internalization is probably indirect. This must be explained clearly to avoid confusion/misunderstandings.

      We thank the reviewer for this important clarification. We fully agree that flippases act on phospholipids, not fatty acids, and that R18 is not metabolically incorporated into phospholipids at the plasma membrane. However, our ongoing work (Rev. Figure 1) shows that R18 preferential labeling affinity for PS and PE in vivo (yeast phospholipid synthesis mutants), consistent with its flippase-dependent localization. Flippases are known to specifically flip PS and PE. While R18 itself is not enzymatically modified or incorporated into phospholipids, its membrane distribution may thus depend on the lipid environment and the activity of lipid-translocating proteins.

      Preliminary data supporting this observation are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      5) A couple of manuscript has shown a (partial) role of Drs2 in autophagy. The authors must explain the discrepancy between their own results and what published, especially because they use the GFP-Atg8 processing assay, which is less sensitive than the Pho8delta60 used in the other studies.

      We thank the reviewer for raising this important point. We are aware of prior reports implicating Drs2 in autophagy and in fact discussed this work directly with the authors during the course of our experiments, who kindly provided helpful suggestions. While our GFP-Atg8 processing assay did not show significant defects upon Drs2 deletion, strain background differences may explain this discrepancy. We also appreciate the suggestion to use the Pho8Δ60 assay and plan to include it in future experiments.

      Additionally, authors should check whether the Atg2 and Atg18 proteins are present at the IM-ER membrane contact sites in the same rates after nutrient replenished than when cells are nitrogen-starved, since this complex would determine the lipid transfer dynamics at this membrane contact site.

      We thank the reviewer for the helpful suggestion. We plan to perform additional experiments to monitor Atg18 localization during the nutrient replenishment assay.

      6) Authors used a predicted Atg2 lipid-transfer mutant (Srinivasan et al, J Cel Biol, 2024), but not direct prove that this mutant is defective for this activity. As previously done for other Atg2/ATG2-related manuscripts (Osawa et al, Nat Struct Mol Biol, 2019; Valverde et al, J Cel Biol, 2019), this must be measure in vitro. Moreover, they do not show whether other known functions of Atg2 are unaffected when expressing this Atg2 mutant, e.g. formation of the IM-ER MCSs, Atg2 interaction with Atg9 and localization at the extremity of the IM...

      We thank the reviewer for this concern. The lipid-transfer-deficient Atg2 mutant used here is based on the same structural rationale as in our recent parallel study (Sakai et al., bioRxiv 2025; https://www.biorxiv.org/content/10.1101/2025.05.24.655882v1, currently under revision). In that study, we addressed whether Atg2 indeed functions as a bridge-like lipid transfer protein, and also used R18 to directly demonstrate the lipid transfer defect of this Atg2 mutant in vivo.

      We therefore believe that referencing this study provides mechanistic support for the use of this Atg2 mutant in the current manuscript. A citation and brief explanation have now been added to the revised text (line 315-316, 439-441). We also plan to perform the lipid transfer assay in vitro.

      7) The mNG-Atg8 signal is not recovered in the fluorescent recovery assays. Based on the observation that R18 signal comes back after photobleaching, authors suggest that the supply of Atg8 is not required for IM expansion. This idea is opposite to data where the levels of Atg8 and deconjugation of lipidated Atg8 determines the size of the forming autophagosomes (e.g., Xie et al, Mol Biol Cell, 2008; Nair et al, Autophagy, 2012). Similar results have also been obtained in mammalian cells (Lazarou and Mizushima results in cell lacking components of the two ubiquitin-like conjugation systems). This discrepancy requires an explanation.

      We thank the reviewer for pointing out this imprecise interpretation, and we sincerely apologize for the confusion it may have caused. We fully agree that Atg8 is essential for the expansion of the isolation membrane (IM), as supported by previous studies. In our FRAP data, mNG-Atg8 showed gradual recovery at the later timepoints, indicating that Atg8 can be replenished over time. The reason why R18 recovery appears much more rapid is likely due to the inherently fast lipid transfer activity of Atg2, the bridge-like lipid transport protein. In contrast, Atg8 signal recovery may have been delayed for two reasons: (1) slower recruitment kinetics to the IM, and (2) partial depletion of the available mNG-Atg8 protein pool due to photobleaching during the experiment.

      We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      8) Although authors claim that there is a retrograde lipid transfer from the IM to the ER, based on the data, it quite difficult to extract these conclusions as they show a decrease in the lipid flow dynamics rather to an inversion of the lipid flow per se. Can the authors exclude that ER microdomains are formed at the ERES in contact with the IM, and consequently what they measure is a slow diffusion of R18-labeled lipid from other part of the ER to these ERES?

      We appreciate the reviewer's insightful comment. Indeed, we are also considering the possibility that lipid-enriched microdomains may form in the ER and contribute to complex lipid dynamics at contact sites. However, direct visualization of such domains in cells remains technically challenging, this remains one of the important directions we aim to pursue in future studies. While our current data do not allow us to definitively state that all recovered lipids originate from the IM, our FRAP experiments provide indirect yet strong support for the possibility that at least a substantial portion of the recovered lipid signal in the ER derives from the IM. Moreover, following Reviewer 2's major point No.4, we performed a direct comparison of R18 fluorescence recovery between the photobleached ER-IM MCS region and the adjacent bleachedER region (Figure 5K and 5M). Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the Reviewer2 pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396).

      9) The retrograde PL transfer is studied in cells overexpressing Ape1, in which IM elongation is stalled. This is a non-physiological experimental setup and consequently it is unclear whether what observed applies to normal IM/autophagosomes. This event should be shown to occur in WT cells as well.

      We thank the reviewer for this point. Indeed, it remains technically difficult to visualize lipid flow during normal IM expansion in vivo, as this process is rapid and transient. And to date, there are no reports directly addressing lipid flow in this process.

      But the Ape1 overexpression system provides a strategic advantage by temporally extending the IM elongation phase and spatially enlarging the IM, thus offering a unique opportunity to capture membrane behavior that would otherwise be transient and difficult to resolve. Importantly, this system arrests autophagosome closure, which we leveraged to investigate the potential reversibility of phospholipid transfer in a controlled and prolonged context. Without this system, it would be exceedingly difficult for reaserchers to examine the lipid flow directionality in living cells.

      Furthermore, the use of Ape1 overexpression has been widely employed in previous high-impact autophagy studies. We emphasize that our aim is to understand Atg2-mediated lipid transfer, and in this context, the Ape1 system provides a valuable and informative tool without compromising the validity of our conclusions.

      10) From the images provided, it appears that R18 also labels the vacuole. The vacuole form MCSs with the IM. Can the author exclude a passage of R18 from the vacuole to the IM?

      We thank the reviewer for the insightful comment. Our data suggest that R18 traffics from the plasma membrane to the ER, then to autophagy-related structures. Actually, following that, as we kown, autophagosomes will eventually reaches and fused with the vacuole. This explains the occasional weak R18 signals at the vacuole membrane, particularly in late-stage cells. We have revised the figure and clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428)

      Here we also added the results of our onging work (in preparation). R18 tends to accumulate in a dot-like compartment after prolonged rapamycin treatment and incubation (Rev. Figure 2). And the vacuolar labeling of R18 correlates with the degradation status of autophagosomes, rather than reverse lipid transport from the vacuole to the IM (Rev. Figure 2). Taken together, we believe that R18 transport from the vacuole back to the IM is unlikely.

      Preliminary data supporting this response are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      Minor points:

      1) L66. One report has indicated that Vps13 may also play a role in the transfer of lipids from the ER to the IM (Graef lab, J. Cell Biol).

      Thank you for pointing this out. Their excellent work also suggested that the inherent lipid transfer activity of Atg2 is required for IM expansion. We have revised the sentence (lines 67-68, 312-314) and included the appropriate citation at these two places.

      2) L70. It must be indicated that IM is also called phagophore.

      We have revised the sentence (line 70-71). Thank you for pointing this out.

      3) L74. It is mentioned "Additionally, a hydrophobic cavity in the N-terminal region of Atg2 directly tethers Atg2 to the ER, particularly the ER exit site (ERES), which is considered a key hub for autophagosome biogenesis", but there is no experimental evidence supporting that Atg2 is involved in the tethering with the ERES.

      Thank you for pointing this out. We have removed the N-terminal region part and revised the sentence accordingly (line 79-81) to avoid overstatement.

      4) L90. PAS must be listed between the ARS.

      We have revised the sentence (line 97-98). Thank you for pointing this out.

      5) Upon deletion of ATG39 and ATG40, there is a pronounced reduction of mNG-Atg8 labelled with R18. This would suggest that these two ER-phagy receptors are required for the PL transfer from the ER to the IM, which is not the case as autophagy is mildly affected by the absence of them (e.g., Zhang et al, Autophagy, 2020).

      We thank the reviewer for the important comment and agree that Atg39 and Atg40 are not required for phospholipid transfer from the ER to the IM. We have revised the text (lines 155-157). We appreciate if the reviewer could provide the DOI or PubMed ID for this paper.

      6) Authors referred that "no direct evidence has been found to confirm lipid transfer at the ER-IM MCS in living cells" (lines 282-283). However, a recent paper has shown that de novo-synthesized phosphatidylcholine is incorporated from the ER to the autophagosomes and autophagic bodies (Orii et al, J Cel Biol, 2021). This reference should be mentioned in the manuscript.

      Thank you for your insightful reminder. This paper beautifully demonstrated the importance of de novo-synthesized phosphatidylcholine in autophagy using electron microscopy. We have now included its citation and brief discussion in the revised manuscript (lines 74-76, 297-298). However, we respectfully note that direct observation of lipid transfer at the ER-IM MCS in living cells still remains unproven.

      7) In lines 252-253, the sentence "R18 transport from the PM to the ER was partially impaired in osh1Δ osh2Δ, osh6Δ osh7Δ, and oshΔ osh4-1 cells (Figure S3). These results suggest that Osh proteins participate in transferring R18 from the PM to the ER" does not recapitulate what is observed in Fig. S3. Moreover, the Emr lab has generate a tertadeletion mutant in which the PM-ER MCSs are abolished. The authors could examine this mutant.

      We thank the reviewer for this helpful comment and sincerely apologize for the lack of clarity in our original description. Our conclusion was primarily based on the partial PM accumulation of R18 observed in some osh mutant strains shown in Figure S3, which motivated us to further investigate this pathway using the OSW-1 inhibitor. We have revised the corresponding text to improve the logic and clarity of this section.

      We appreciate the recommendation of the tether∆ mutant. Our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      Reviewer #1 (Significance (Required)):

      General assistent: Strength: potential new system to monitor lipid flow Limitations: Indirect evidences and in the case of the retrograde transport of phospholipids, it could be an artefact of the employed experimental approach. Advance: Little advances because something in part already shown in vitro. No new mechanisms uncovered. Audience: Autophagy and membrane contact site fields.

      We sincerely thank the reviewer for the overall evaluation. We agree that our current system offers indirect but promising evidence for lipid transfer events at ER-IM contact sites in vivo. While Atg2-mediated lipid transport has been proposed in vitro, our study adds value by (1) establishing a live-cell imaging way to monitor lipid flow in a non-vesicular transport pathway, (2) proposing a model of reversible one-way lipid transfer activity, and (3) addressing whether findings from simplified in vitro reconstitution accurately reflect the dynamics in the more complex real cellular environment.

      We recognize the limitations of our current approach and plan to include additional analyses to more cautiously interpret the observed retrograde movement. Although we do not claim to identify a new mechanism, we believe our work provides an interesting framework to inspire future efforts aimed at directly probing lipid flow at membrane contact sites in vivo.

      We also sincerely appreciate the reviewer's recognition of the potential value of this system for the autophagy and membrane contact site communities.

      Response to the Reviewer ____#2

      Non-vesicular lipid transfer plays an essential role in organelle biogenesis. Compared to vesicular lipid transfer, it is faster and more efficient to maintain proper lipid levels in organelles. In this study, Hao et al. introduced a high lipophilic dye octadecyl rhodamine B (R18), which specifically labels the ER structures and autophagy-related structures in yeast and mammalian cells. They characterised its distinct lipid entry into yeast cells via lipid flippase Neo1 and Drs2 on the plasma membrane, rather than through the endocytic pathway. They then demonstrated that R18 intracellular trafficking through plasma membrane to ER depends on "box-like" lipid transfer Osh proteins. They further looked into the "bridge-like" lipid transfer protein Atg2, using R18 as a lipid probe to track lipid transfer from ER to the isolation membrane (IM) during membrane expansion and reversible lipid transfer through IM to the ER-IM membrane contact sites (MCS) when autophagy is terminated by nutrient replenishment. The authors provide an interesting model of reversible directionality of Atg2 lipid transfer during autophagy induction and termination.

      We sincerely thank the reviewer for the thoughtful and constructive summary of our work. We are grateful for the recognition of the novelty of using R18 to visualize non-vesicular lipid transfer in vivo and for highlighting the conceptual contribution of our proposed model of reversible Atg2-mediated transport during autophagy.

      In response to the reviewer's valuable suggestions, we have revised key parts of the manuscript and prepared a detailed revision plan to address the specific concerns. We truly appreciate the reviewer's insights, which have been instrumental in improving the clarity of our study.

      Major points:

      1. Line 299-309: The FRAP assays were interesting and well performed. The authors photobleached R18 and Atg8 signal, and found R18 fluorescence recovery but not Atg8, which suggests lipid transfer occurs between ER and the IM and faster than Atg8 lipidation process during IM expansion. These results gave clear evidence that R18 can be transferred during IM expansion. The supply of Atg8 may not be not able to track within this time frame or the recovered amount of Atg8 may not be able to visualized due to the threshold limitation with confocal microcopy. This does not imply the supply of Atg8 to the IM is not required during IM expansion. This should be clarified.

      We thank the reviewer for this valuable comment and fully agree that Atg8 is essential for IM expansion. We apologize for any ambiguity that may have suggested otherwise.

      As pointed out, the lack of mNG-Atg8 recovery in our FRAP assay likely reflects the slower turnover of lipidated Atg8, limited observation time, and photobleaching of the existing protein pool. Notably, we observed a weak but gradual signal recovery at later time points, supporting this view. We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      Please clarify how the length of the IM is measured and determined in Figure 4H and Figure 5D.

      We thank the reviewer for the vaulable comment. We have now clarified the method for quantifying IM length in the revised manuscript. Specifically, we modified the Statistical Analysis section of the Methods (line 642-643).

      Line 336-342: The description of the results should be clarified. Based on Figure 5H, the authors observed a significant decrease in the mNG-Atg8 signal during photobleaching of the R18 signal.

      We thank the reviewer for pointing out the ambiguity. We have now clarified the description in the revised manuscript. The sentence has been modified (line 360-362) as follows: "To determine whether nutrient replenishment terminates autophagy, we selectively photobleached the R18 signal and monitored the R18 (photobleached) and mNG-Atg8 (without photobleaching) signal following nutrient replenishment."

      The authors photobleached ER-IM MCS and the ER region (boxed region in Figure 5J) and quantified fluorescence recovery, normalized to the IM region and an ER control. The ER control was taken from the other cell. It would be helpful to compare and analyse the fluorescence recovery of R18 in the bleached ER region near the ER-IM MCS to that in the ER-IM MCS. This would help to confirm the ER-IM MCS fluorescence recovery is due to signal coming from the IM.

      We sincerely thank the reviewer for this insightful suggestion. We have now performed the suggested comparison. Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the reviewer pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396). Again, we appreciate this constructive and helpful suggestion.

      In figure 5K, the autophagic structure or IM labelled by R18 seems to be maintained when the mNG-Atg8 signal decreases or dissociates from the IM. Could the authors comment on that how they interpret the termination of the prolonged IM structure and IM shrinkage?

      We thank the reviewer for this insightful observation. Based on our live-cell imaging, we speculate that following the initial dissociation of Atg8, the IM membrane undergoes a relatively slow disassembly process, potentially retracting toward the ER-IM MCS, which often localizes near ER exit sites (ERES). This suggests that IM shrinkage may proceed via Atg8-independent mechanisms. Although the precise pathway remains unclear, we occasionally observed vesiculation events during this phase, supporting the idea that membrane remodeling continues even in the absence of Atg8. In response to this comment, we have revised our manuscript to reflect these interpretations (line 494-496).

      The author has shown that Atg2Δ and Atg2LT lipid transfer mutant impair R18 labelling of autophagic structures in Figure 4C. However, the evidence supporting that R18 fluorescence recovery at ER-IM MCS is mediated by reversible Atg2 lipid transfer is not direct. It would be helpful to clarify whether Atg2 stays on the enlarged autophagic membranes when the membrane has reached to its maximum length and no longer grows.

      We thank the reviewer for this important suggestion. As noted in our response to Reviewer 1 (Major Point 8-2), clarifying whether Atg2/Atg18 remains at the ER-IM contact sites after IM expansion is indeed important for supporting the reversible lipid transfer model. We plan to monitor the localization of Atg18 during the nutrient replenishment assay.

      Minor points:

      1. Figure 2A "Dpm-GFP" is missing. The experiment replicates in Figure 2M should be indicated.

      We thank the reviewer for pointing out these issues. The label for "Dpm-GFP" has been added in Figure 2A, and the number of experimental replicates for Figure 2M is now indicated in the figure legend.

      Figure S2, the magenta panel should be "R18".

      We thank the reviewer for catching this labeling error. We have corrected the magenta panel label in Figure S2 to "R18" in the revised version of the figure.

      Line 341-342: "Figure 5H and 5J" should be "Figure 5H and 5I"

      We thank the reviewer for pointing out this error. The citation has been corrected from "Figure 5H and 5J" to "Figure 5H and 5I" in the revised manuscript.

      Please describe how the lipid docking model of Atg2 is generated.

      We thank the reviewer for this question. We have added a description of the modeling approach in the Methods section of the revised manuscript (lines 640-646). We also added the configuration files of AlphaFold3 to the supplementary information.

      Reviewer #2 (Significance (Required)):

      Currently, lipid probes are emerging as powerful tools to understand membrane dynamics, integrity, and the lipid-mediated cellular functions. In this manuscript, the authors performed a detailed characterisation of octadecyl rhodamine B (R18) as a potential lipid probe, which specifically labels ER and autophagic membranes. They present high quality imaging data and performed FRAP experiments to monitor the membrane dynamics and investigate the lipid transfer directionality between the ER and autophagic structure. However, the evidence of Atg2-mediated reversible lipid transfer may not be direct and sufficient. The proposed reversible lipid transfer model is interesting and provides an explanation of lipid level regulation during autophagosome formation.

      We sincerely thank the reviewer for the positive assessment of our work and for acknowledging the potential of R18 as a lipid probe, as well as the quality of our imaging and FRAP experiments. We are particularly grateful that the reviewer found the proposed model of reversible lipid transfer both interesting and relevant to the broader question of lipid regulation during autophagosome formation.

      Regarding the reviewer's concern that the evidence for Atg2-mediated reversible lipid transfer may not be sufficiently direct, we agree this is a critical point. While technical limitations currently prevent direct visualization of lipid flow reversal at single-molecule resolution in vivo, we hope our revision plan strengthen the proposed model and better convey its biological relevance, while also acknowledging the current limitations and the need for further mechanistic work.

      Response to the ____Reviewer #3

      The authors address the question of how autophagic membrane seeds expand into autophagosomes. After nucleation, IMs expand in dependence of the bridge-like lipid transfer protein Atg2, which has been shown to tether the IM to the ER. Several studies have shown in vitro evidence for direct lipid transfer by Atg2 between tethered membranes, and previous evidence has shown that the hydrophobic groove of Atg2 implicated in lipid transfer is required for autophagosome biogenesis in vivo in yeast and mammalian cells.

      In this manuscript, the authors take advantage of the dye R18, which they show accumulates mainly in the ER after a few minutes. They show specifically that the import of R18 into cells and transfer to the ER depends on the activity of flippases in the plasma membrane and OSPB-related lipid transporter. Using different sets of FRAT experiments, the authors track the fluorescence recovery of R18 in the IM, the IM-ER membrane contact site and the neighboring ER. From these experiments the authors conclude that (a) R18 is transferred to IM from the ER when IMs expand and (b) can be transferred from IMs back to the ER when autophagy is deactivated.

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this thoughtful and encouraging summary. We appreciate the recognition of our approach using R18 to visualize lipid dynamics at ER-IM contact sites, and agree that in vivo quantitative data are critically needed to advance our understanding of autophagic membrane expansion.

      We also fully agree with the reviewer that our current study provides indirect-but conceptually informative-support for Atg2-mediated reversible one way lipid transfer. While prior in vitro studies have demonstrated the lipid transfer capability of Atg2, our goal here was to develop a live-cell system that allows the dynamic tracking of lipid flow in vivo, and to explore the possibility of reversible transport during autophagy termination. We hope our story will offer unique insights for future studies aiming to directly probe lipid transfer mechanisms in live cells.

      Regarding the reviewer's concern about the lack of direct evidence that Atg2's lipid transfer activity underlies the observed R18 dynamics, we fully acknowledge this limitation. To address this point, we would like to cite our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), which provides additional mechanistic evidence linking R18 dynamics to the lipid transfer function of Atg2. Further details and planned revisions are described in the responses below.

      Major points:

      (1) The authors use R18in FRAP experiments to follow its transfer from the ER into IMs. However, whether this transfer is mediated by Atg2 via its inherent lipid transfer activity remains indirect. The only evidence that implicates Atg2 directly is the observation that a lipid transfer deficient Atg2 variant fails to support IM expansion and autophagosome biogenesis. A similar full-length Atg2 mutant has previously been shown to block autophagosome formation in Dabrowski et al. 2023 in yeast, which the authors do not cite or discuss, suggesting the inherent lipid transfer activity of Atg2 is required for IM expansion. However, aside from this experiment, the mechanisms underlying R18 transfer remain unclear and, while they likely depend on or are at least partially mediated by Atg2, they may involve alternative mechanisms including vesicle transport or continuous membrane contacts. Moreover, for the assays with stalled or dissolving IM, it is essential for the authors to test whether Atg2 is still associated with these IMs. It is quite possible that Atg2 dissociates from maximally expanded or dissolving IMs, which would make their interpretation of the data very unlikely. Thus, it will be critical to provide consistent evidence that lipid transfer from the IM to the ER is mediated by Atg2. Ideally, the authors would label IM with BFP-Atg8, R18, and Atg2-GFP and perform their in vivo analysis.

      We sincerely thank the reviewer for the critical comments and valuable suggestions. To further support the link between R18 transfer and Atg2, we would like to highlight two complementary findings. As noted in our response to Reviewer 1 (Major Point 3), R18 can still label the PAS even when Atg2 is recruited but IM expansion is impaired, suggesting that R18 trafficking occurs in an Atg2-dependent manner. In addition, in our parallel study (bioRxiv, 2025.05.24.655882v1), we demonstrated that Atg2 acts as a bridge-like lipid transfer protein. Notably, when we mutated the bridge-forming region of Atg2, R18 transport to the IM was also disrupted.

      We greatly appreciate the reviewer's reminder regarding the study by Dabrowski et al., 2023, which we have now cited and discussed in the revised manuscript (lines 66-68, 312-314). Their findings that the inherent lipid transfer activity of Atg2 is required for autophagosome formation in vivo strongly reinforce our model.

      Regarding the possibility of vesicle transport, we consider this contribution minimal based on R18's preferential labeling of continuous membranes and its divergence from FM4-64 staining. As for the role of continuous membrane contacts, as also mentioned in our response to Reviewer 1, our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      We also thank the reviewer for the suggestion to monitor Atg2 localization at the dissolving IM. As similarly pointed out by two other reviewers, we plan to track Atg18 during the nutrient replenishment assay.

      Finally, we appreciate the idea of triple-labeling with BFP-Atg8, R18, and Atg2-GFP. While our preliminary attempts encountered technical difficulties such as abnormal BFP-Atg8 localization and severe bleaching during long-term imaging in yeast, we plan to optimize this approach in future experiments.

      (2) Given the ER forms contact sites with many organelles using bridge-like lipid transfer proteins, how do the authors explain the preferential accumulation of R18 in ARS and not in for example PM (Fmp27), mitochondria, endosomes or vacuole (Vps13)? Why should R18 specifically transferred by Atg2 and not or to a much lower rate by Fmp27 or Vps13?

      We sincerely thank the reviewer for raising this insightful question. Indeed, we have carefully considered this point. Our data indicate that R18 labeling of autophagy-related structures (ARS) depends on Atg2, as demonstrated in the present manuscript and supported by our parallel study currently under revision (bioRxiv, 2025.05.24.655882v1).

      We speculate that the preferential accumulation of R18 in ARS may arise from structural and contextual differences among bridge-like LTPs, such as Atg2, Vps13, and Fmp27. Although all are capable of mediating lipid transfer, these proteins differ in their membrane tethering modes, cargo specificity, and spatial regulation. For example, Atg2 localizes specifically to ER-IM contact sites during autophagosome formation, where membrane expansion requires rapid lipid supply. In contrast, Vps13 and Fmp27 may function at more stable or less dynamic contacts, where lipid turnover or probe accessibility is more limited. We have added a brief discussion of this point in the revised manuscript to reflect this important consideration (lines 439-444).

      (3) Does R18 label autophagic bodies after they are formed. Could the authors add R18 after autophagic bodies have formed in atg15 or pep4 cells?

      We thank the reviewer for this excellent suggestion. To address whether R18 can label autophagic bodies post-formation, we plan to perform additional experiments by adding R18 after autophagic bodies have accumulated in atg15Δ or pep4Δ cells. This will help clarify whether R18 incorporates into pre-formed autophagic bodies or requires earlier membrane dynamics for its labeling.

      (4) Since Neo1- or OSBP-defective cells do not transfer R18 from the PM to the ER or other membranes, the authors should include these strains as controls for ER-dependent R18 transfer to ARSs.

      We thank the reviewer for this insightful suggestion. To further validate the ER-dependency of R18 transfer to autophagy-related structures, we plan to include Neo1- and OSBP-deficient strains as additional controls.

      Comments:

      The authors neglect to mention or discuss important recent literature directly related to their study:

      Schutter et al., Cell (2020); Orii et al., JCB (2021); Polyansky et al., EMBOJ (2022); Dabrowski et al., JCB (2023); Shatz et al., Dev Cell (2024)

      We sincerely thank the reviewer for pointing out these important and highly relevant studies. We apologize for our oversight in not citing them earlier. Each of these works has provided valuable insights that are directly related to and have greatly informed our current study. We have now cited and discussed these references in appropriate sections of the revised manuscript.

      Figure 1A and B: The authors need to describe how these cells were stained with R18 in the figure legend or text to help the reader to understand how these experiments were performed. Figure legends need to indicate at which time point after rapamycin treatment cells were analyzed.

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends to clarify the staining procedure and time points.

      The authors need to clarify whether mNG-Atg8 colocalization with R18 was included for dot- and ring-like structures for WT cells as shown separately in 1A but not in 1B.

      Thank you for the comment. The quantification in Figure 1B includes both dot- and ring-like structures of mNG-Atg8 colocalized with R18 in WT cells, as shown in Figure 1A. We have now clarified this point in the revised figure legend.

      Figure 1C: The figure legend needs to describe the conditions cells were treated with and when cells were analyzed after rapamycin treatment (presumably).

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends.

      Figure 1C: The authors should combine atg15 and pep4 deletions with atg2 or atg7 as controls in which autophagic bodies are not formed.

      Thank you for the valuable suggestion. We plan to perform these experiments that combine atg15 and pep4 deletions with atg2 or atg7 as controls.

      Figure 1E and F: R18 stains more than just the ER in the cells shown. In addition to atg39 and atg40, authors should include atg11 to inhibit all forms of selective autophagy.

      Thank you very much for the insightful comment. We agree and plan to include the atg11Δ mutant to inhibit all forms of selective autophagy.

      Figure S2A and B: The figures are mislabeled. Instead of FM4-64 it should say R18. In addition to the ER, in several images it is obvious to see R18 staining the vacuole membrane (for example Figure 2A 30 degrees) and others. Thus, the strong thresholding in S2 may give the reader an oversimplified view on R18 localization. This needs to be corrected.

      Thank you very much for pointing this out. We have corrected the labeling error in Figure S2A and B. Regarding the observation that R18 occasionally labels the vacuole membrane, we agree with the reviewer's comment. Based on our data, we believe that this signal likely reflects autophagosomes that have reached and fused with the vacuole, as expected in the later stages of autophagy. We have clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428).

      Figure 1G and H: In 1G, there are number of R18-stained patches not co-labeled by GFP-ER. What are these patches and which organelles to they represent? In 1H, given the tight association of the ER (omegasome) with forming IMs, it is difficult to discern whether R18 labels surrounding ER membrane or the IM itself. This needs to be more closely analyzed. The authors need to quantify these data similar to the yeast data.

      Thank you for the suggestion. We plan to perform additional quantification and colocalization analysis to clarify the identity of R18-positive signals in 1G and 1H.

      Figure 4A-C: A full-length PLT-deficient variant of Atg2 has been analyzed by Dabrowski et al, JCB 2023 in vivo. This work needs to be cited and discussed. The analysis needs to include punctate Atg8 structures for WT cells to exclude effects due to expansion defects.

      Thank you for the suggestion. We have now cited and discussed the work by Dabrowski et al., JCB 2023 in the revised manuscript (lines 67-68, 312-314). In addition, we have included an analysis of punctate Atg8 structures in WT cells to address the concern regarding potential expansion defects.

      Figure 4F-H: To measure the size changes in IMs, the authors would need to perform these experiments without bleaching the mNG-Atg8 signals.

      We apologize for the lack of clarity. The method for measuring IM size has now been added to the revised manuscript. In Figure 4, we note that mNG-Atg8 fluorescence actually shows a slow recovery over time. This limited recovery likely reflects both the slower turnover of Atg8 and the fact that the pre-existing Atg8 pool at the IM was partially photobleached. We have now revised the main text to clarify this point and included additional explanation (line 326-330).

      Figure 5C: The authors need to indicate the bleached areas in the mNG-Atg8 image for easier orientation. It looks to me that the area that the authors mark as IM-ER MCS is really the IM in proximity to the ER. Thus, if lipid transfer to the IM has ceased, I would not expect recovery here. If the IM-ER MCS area includes IM and the ER to similar extent, I would expect exactly what the authors show: IM does not recover while ER quickly recovers. On average, we would observe reduced recovery as shown in 5D.

      Thank you for the helpful suggestion, and we apologize for the oversight during figure preparation. We have now clearly indicated the bleached areas in the merged image in Figure 5C for better orientation. Additionally, we have carefully re-examined the defined ER-IM MCS region and confirm that the quantified area indeed corresponds to the contact site between the ER and the IM. And double checked the measurements shown in the figure remain correct.

      Figure 5L: Since mNG-Atg8 signal homogenously disappears from the IM, it is meaningless to measure size. How do the authors measure the size of something they cannot detect?

      Thank you for pointing this out. We agree with the reviewer's comment and have removed the panel from the revised version accordingly.

      Figure 5K: The authors need to show the whole bleached area overtime for the reader to be able to see where the recovered R18 signal might be coming from. Currently, it is impossible to discern whether the signal comes from the IM or from slow recovery from neighboring ER.

      We appreciate this insightful comment. To address the concern and following the suggestion from Reviewer 2 (Major Point No.4), we have now revised the figure to include an additional measurement of fluorescence recovery in the adjacent bleached ER (Figure 5K and 5M) (lines 384-396). These results further support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS originates from the IM, rather than from the adjacent bleached ER, which shows slower and less efficient recovery.

      We have also added time-lapse videos to the supplementary information due to space limitations in the main figure.

      Reviewer #3 (Significance (Required)):

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this encouraging and thoughtful comment. We appreciate the recognition that our live-cell approach using a lipophilic dye provides a valuable framework to visualize lipid dynamics during autophagosome biogenesis. As the reviewer pointed out, quantitative in vivo evidence is critically needed in this field, and we hope our study contributes meaningfully toward that goal.

      We also fully acknowledge the limitation. While our current data offer indirect evidence for Atg2-mediated lipid transfer, we would like to support this by our revision plan and also our parallel study (bioRxiv, 2025.05.24.655882v1) that shows Atg2 is indeed a bridge-like LTP and R18 transfer is lost in the bridge-structure defective strain. Together, we hope these can suggest that the lipid transfer activity of Atg2 underlies the observed R18 dynamics in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)  

      Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section: 

      “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”

      (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.

      Thank you, we have now added a description in both Fig. 2 and 3:

      “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”

      (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)

      Thank you - fixed.

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. https://en.wikipedia.org/w/index.php?title=Luddite&oldid=1189255462 (visited on 2023-12-10). [u3] Ted Chiang. Will A.I. Become the New McKinsey? The New Yorker, May 2023. URL:

      This article argues that AI is more beneficial for the bourgeoise and the corporate world, rather than the working class. It even makes a comparison to McKinsey in order to further its argument. I think this post makes a really good point as we can see a lot of entry level job being more competitive or downright replaced by AI so corporate can cut cost, making the rich even richer.

    1. Multivariate predictive models play a crucial role in enhancing our understanding of complex biological systems and in developing innovative, replicable tools for translational medical research. However, the complexity of machine learning methods and extensive data pre-processing and feature engineering pipelines can lead to overfitting and poor generalizability. An unbiased evaluation of predictive models necessitates external validation, which involves testing the finalized model on independent data. Despite its importance, external validation is often neglected in practice due to the associated costs. Here we propose that, for maximal credibility, model discovery and external validation should be separated by the public disclosure (e.g. pre-registration) of feature processing steps and model weights. Furthermore, we introduce a novel approach to optimize the trade-off between efforts spent on training and external validation in such studies. We show on data involving more than 3000 participants from four different datasets that, for any “sample size budget”, the proposed adaptive splitting approach can successfully identify the optimal time to stop model discovery so that predictive performance is maximized without risking a low powered, and thus inconclusive, external validation. The proposed design and splitting approach (implemented in the Python package “AdaptiveSplit”) may contribute to addressing issues of replicability, effect size inflation and generalizability in predictive modeling studies.

      A version of this preprint has been published in the Open Access journal GigaScience (see paper (https://doi.org/10.1093/gigascience/giaf036), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

      Original version

      Reviewer 1: Qingyu Zhao

      The manuscript discusses an interesting approach that seeks optimal data split for the pre-registration framework. The approach adaptively optimizes the balance between predictive performance of discovery set and sample size of external validation set. The approach is showcased on 4 applications, demonstrating advantage over traditional fixed data split (e.g., 80/20). I generally enjoyed reading the manuscript. I believe pre-registration is one important tool for reproducible ML analysis and the ideology behind the proposed framework (investigating the balance between discovery power and validation power) is urgently needed. My main concerns are all around Fig. 3, which represents the core quantitative analysis but lacks many details.

      1. Fig. 3 is mostly about external validation. What about training? For each n_total, which stopping rule is activated? What is the training accuracy? What does l_act look like? What is \hat{s_total}?
      2. Results section states "the proposed adaptive splitting strategy always provided equally good or better predictive performance than the fixed splitting strategies (as shown by the 95% confidence intervals on Figure 3)". I'm confused by this because the blue curve is often below other methods in accuracy (e.g., comparing with 90/10 split in ABIDE and HCP).
      3. Why does the half split have the lowest accuracy but the highest statistical power?
      4. How was the range of x-axis (n_total) selected? E.g., HCP has 1000 subjects, why was 240-380 chosen for analysis?
      5. The lowest n_total for BCW and IXI is approximately 50. If n_act starts from 10% of n_total, how is it possible to train (nested) cross-validation on 5 samples or so?

      Two other general comments are: 1. How can this be applied to retrospective data or secondary data analysis where the collection is finished? 2. Is there a guidance on the minimum sample size that is required to perform such an auto-split analysis? It is surprising that the authors think the two studies with n=35 and n=38 are good examples of training generalizable ML models. It is generally hard to believe any ML analysis can be done on such low sample sizes with thousands of rs-fMRI features. By the way, I believe n=25 in Kincses 2024 if I read it correctly.

      Reviewer 2: Lisa Crossman

      External validation of machine learning models - registered models and adaptive sample splitting Gallito et al. The Manuscript describes a methodology and algorithm aimed at better choosing a train-test validation split of data for scikit-learn models. A python package, adaptivesplit, was built as part of this MS as a tool for others to use. The package is proposed to be used together with a suggested workflow to integrate an approach invoking registered models as a full design for better prospective modelling studies. Finally, the work is evaluated on four alternative publicly available datasets of health research data and comprehensive results are presented. There is a trade-off in the split between the amount of sample data to be used for training and the amount of data to use for validation. Ideally the content of each must be balanced in order for the trained model to be representative and equally for the validation set to be representative. This manuscript is therefore very timely due to the large increase in the use of AI models and provides important information and methodology.

      This reviewer does not have the specific expertise to provide detailed comments on the statistical rule methods.

      Main Suggested Revision: 1. The Python implementation of the "adaptivesplit" package is described as available on GitHub (Gallitto et al., n.d.). One of the major points of the paper is to provide the python package "adaptivesplit", however, this package does not have a clear hyperlink, and is not found by simple google searches, and it appears is not yet available. It is therefore not possible to evaluate it at present. There is a website found available with a preprint of this MS after further google searches, https://pnilab.github.io/adaptivesplit/ however, adaptive split is here shown as an interactivate jupyter-type notebook example and not as a python library code. Therefore, it is not clear how available the package is for others' use. Can the authors comment on the code availability?

      Minor comments: 1. Apart from the 80:20 Pareto split of train-test data, other splits are commonly used in ratios such as 75:25 (the scikit-learn default split if ratio is unspecified), and 70:30. Also the cross-validation strategy with train-test-validation split 60:20:20, yet these strategies have not been mentioned or included in the figures such as Fig 3. The splits provided in the figure and discussed are 50:50, 80:20 and 90:10 only. Could the authors discuss alternative split ratios?

    1. I think that the students’ voice is not always heard entirely, even through dialogue. I feel that by doing this journal we can make a difference with our personal experience and touch the heart of someone who is willing to stand by us. I also wanted to get the attention of other students who may be feel-ing the same frustration I have felt

      Rashida’s words remind me that being asked to speak is not the same as being truly heard. Even when dialogue happens, students’ insights can be filtered or dismissed by adults who hold more power. Her hope that personal experience can move someone to take action reveals a quiet kind of strength. It’s thoughtful and brave—she’s using her voice not just to describe injustice, but to change who listens and how they respond

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      We have examined functional deficits in kif1aa mutants in another paper that was recently accepted: David et al. 2024. https://pubmed.ncbi.nlm.nih.gov/39373584/

      In David et al., we found that in addition to a subtle role in ribbon fusion during development, Kif1aa plays a major role in enriching glutamate-filled synaptic vesicles at the presynaptic active zone of mature hair cells. In kif1aa mutants, synaptic vesicles are no longer enriched at the hair cell base, and there is a reduction in the number of synaptic vesicles associated with presynaptic ribbons. Further, we demonstrated that kif1aa mutants also have functional defects including reductions in spontaneous vesicle release (from hair cells) and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow. Because our current paper focuses on microtubule-associated ribbon movement and dynamics early in hair-cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window. In our revision, we have referenced this recent work. Currently it is challenging to disentangle how the subtle defects in ribbon formation in kif1aa mutants contribute to the defects we observe in ribbon-synapse function.

      Added to results:

      “Recent work in our lab using this mutant has shown that Kif1aa is responsible for enriching glutamate-filled vesicles at the base of hair cells. In addition this work demonstrated that loss of Kif1aa results in functional defects in mature hair cells including a reduction in evoked post-synaptic calcium responses (David et al., 2024). We hypothesized that Kif1aa may also be playing an earlier role in ribbon formation.”

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      These are important strengths and as stated we are currently investigating what other kinesins and adaptors and adaptor’s transport ribbons. We have ongoing work examining how hair-cell activity impacts ribbon fusion and transport!

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point. Previous immunohistochemistry work in mice demonstrated that ribbons and Kif1a colocalize in mouse hair cells (Michanski et al, 2019). Unfortunately, the antibody used in study work did not work in zebrafish. To further investigate this interaction, we also attempted to create a transgenic line expressing a fluorescently tagged Kif1aa to directly visualize its association with ribbons in vivo. At present, we were unable to detect transient expression of Kif1aa-GFP or establish a transgenic line using this approach. While we will continue to work towards understanding whether Kif1aa and ribbons colocalize in live hair cells, currently this goal is beyond the scope of this paper. In our revision we discuss this caveat.

      Added to discussion:

      “In addition, it will be useful to visualize these kinesins by fluorescently tagging them in live hair cells to observe whether they associate with ribbons.”

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 public response weaknesses.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

      This is correct and a caveat of our Kif1aa and drug experiments. In our recently published work, we confirmed that Kif1aa is expressed in hair cells and neurons, while kif1ab is present just is neurons. Therefore, it is likely that the ribbon formation defects in kif1aa mutants are restricted to hair cells. We added this expression information to our results:

      “ScRNA-seq in zebrafish has demonstrated widespread co-expression of kif1ab and kif1aa mRNA in the nervous system. Additionally, both scRNA-seq and fluorescent in situ hybridization have revealed that pLL hair cells exclusively express kif1aa mRNA (David et al., 2024; Lush et al., 2019; Sur et al., 2023).”

      Non-hair cell effects are a real concern in our pharmacology experiments. To mitigate this in our pharmacological experiments, we have performed drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The fast experiments were done after 30 min nocodazole drug treatment, and after this treatment we observed reduced directional motion and fusions. This fast drug treatment should not incur any long-term changes or developmental defects as hair-cell development occurs over 12-16 hrs. However, we acknowledge that drug treatments could have secondary phenotypic effects or effects that are not hair-cell specific. In our revision, we discuss these issues.

      Added to discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30-70 min and 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      We agree that overexpression of transgenes under using a non-endogenous promoter in transgenic lines is an important consideration. Ideally, we would do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. The decrease in precursors is likely not due to regulation by the myo6a promoter. Although the myo6a promoter comes on early in hair cell development, the promoter only gets stronger as the hair cells mature. This would lead to a continued increase rather than a decrease in puncta numbers with development.

      Protein tags such as tagRFP always have the caveat of impacting protein function. This is in partly why we complemented our live imaging with analyses in fixed tissue without transgenes (kif1aa mutants and nocodazole/taxol treatments).

      In our revision, we did perform an immunolabel on myo6b:riba-tagRFP transgenic fish and found that Riba-tagRFP expression did not impact ribbon synapse numbers or ribbon size. This analysis argues that the transgene is expressed at a level that does not impact ribbon synapses. This data is summarized in Figure 1-S1.

      Added to the results:

      “Although this latter transgene expresses Riba-TagRFP under a non-endogenous promoter, neither the tag nor the promoter ultimately impacts cell numbers, synapse counts, or ribbon size (Figure 1-S1A-E).”

      Added to methods:

      Tg(myo6b:ctbp2a-TagRFP)<sup>idc11Tg</sup> reliably labels mature ribbons, similar to a pan-CTBP immunolabel at 5 dpf (Figure 1-S1B). This transgenic line does not alter the number of hair cells or complete synapses per hair cell (Figure 1-S1A-D). In addition, myo6b:ctbp2a-TagRFP does not alter the size of ribbons (Figure 1-S1E).”

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      We did attempt a co-localization analysis between microtubules and ribbons but did not move forward with it due to several issues:

      (1) Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and ultimately, we found that co-localization analyses were not meaningful because the distances were too small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in hair cells due to highly varying filament intensities, filament dynamics and the presence of diffuse cytoplasmic tubulin signal.

      Because of these challenges we concluded the best evidence of ribbon-microtubule association is through visualization of ribbons and their association with microtubules over time (in our timelapses). We see that ribbons localize to microtubules in all our timelapses, including the examples shown (Movies S2-S10). The only instance of ribbon dissociation it when ribbons switch from one filament to another. We did not observe free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      Yes, it is true, that directed transport of ribbon precursors is relatively rare. Only a small subset of the ribbon precursors moves directionally (α > 1, 20 %) or have a displacement distance > 1 µm (36 %) during the time windows we are imaging. The majority of the ribbons are stationary. To emphasize this result we have added bar graphs to Figure 3I,K to illustrate this result and state the numbers behind this result more clearly.

      “Upon quantification, 20.2 % of ribbon tracks show α > 1, indicative of directional motion, but the majority of ribbon tracks (79.8 %) show α < 1, indicating confinement on microtubules (Figure 3I, n = 10 neuromasts, 40 hair cells, and 203 tracks).

      To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells, and 203 tracks).”

      We cannot say for certain what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion sufficient to reach the active zone. This idea is supported by the fact that we see ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses (Movies S4 and S5). It is possible that ribbons that are stationary may not have enough motors attached, or there may be a ‘seeding’ phase where Ribeye aggregates are condensing on the ribbon.

      We also reexamined our MSD a values as the a values we observed in hair cells were lower than those seen canonical motor-driven transport (where a approaches 2). One reason for this difference may arise from the dynamic microtubule network in developing hair cells, which could affect directional ribbon movement. In our revision we plotted the distribution of a values which confirmed that in control hair cells, the majority of the a values we see are typically less than 2 (Figure 7-S1A). Interestingly we also compared the distribution a values between control and taxol-treated hair cells, where the microtubule network is more stable, and found that the distribution shifted towards higher a values (Figure 7-S1A). We also plotted only ‘directional’ tracks (with a > 1) and observed significantly higher a values in taxol-treated hair cells (Figure 7-S1B). This is an interesting result which indicates that although the proportion of directional tracks (with a > 1) is not significantly different between control and taxol-treated hair cells (which could be limited by the number of motor/adapter proteins), the ribbons that move directionally do so with greater velocities when the microtubules are more stable. This supports our idea that the stability of the microtubule network could be why ribbon movement does not resemble canonical motor transport. This analysis is presented as a new figure (Figure 7-S1A-B) and is referred to in the text in the results and the discussion.

      Results:

      “Interestingly, when we examined the distribution of α values, we observed that taxol treatment shifted the overall distribution towards higher α a values (Figure 7-S1A). In addition, when we plotted only tracks with directional motion (α > 1), we found significantly higher α values in hair cells treated with taxol compared to controls (Figure 7-S1B). This indicates that in taxol-treated hair cells, where the microtubule network is stabilized, ribbons with directional motion have higher velocities.”

      Discussion:

      “Our findings indicate that ribbons and precursors show directed motion indicative of motor-mediated transport (Figure 3 and 7). While a subset of ribbons moves directionally with α values > 1, canonical motor-driven transport in other systems, such as axonal transport, can achieve even higher α values approaching 2 (Bellotti et al., 2021; Corradi et al., 2020). We suggest that relatively lower α values arise from the highly dynamic nature of microtubules in hair cells. In axons, microtubules form stable, linear tracks that allow kinesins to transport cargo with high velocity. In contrast, the microtubule network in hair cells is highly dynamic, particularly near the cell base. Within a single time frame (50-100 s), we observe continuous movement and branching of these networks. This dynamic behavior adds complexity to ribbon motion, leading to frequent stalling, filament switching, and reversals in direction. As a result, ribbon transport appears less directional than the movement of traditional motor cargoes along stable axonal filaments, resulting in lower α values compared to canonical motor-mediated transport. Notably, treatment with taxol, which stabilizes microtubules, increased α values to levels closer to those observed in canonical motor-driven transport (Figure 7-S1). This finding supports the idea that the relatively lower α values in hair cells are a consequence of a more dynamic microtubule network. Overall, this dynamic network gives rise to a slower, non-canonical mode of transport.”

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

      When using nocodazole, we worked to optimize the concentration of the drug to minimize cytotoxicity, while still being effective. While the more stable filaments at the cell apex remain largely intact after nocodazole treatment, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We have clarified this in our results. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells expressing YFP-tubulin (Figure 4-S1F-G), that highlight cytoplasmic YFP-tubulin and long, stabilized microtubules after 3-4 hr treatment with nocodazole and taxol respectively. In these images we also point out microtubules at the apical region of hair cells that are very stable and do not completely destabilize with nocodazole treatment at concentrations that are tolerable to hair cells.

      “We verified the effectiveness of our in vivo pharmacological treatments using either 500 nM nocodazole or 25 µM taxol by imaging microtubule dynamics in pLL hair cells (myo6b:YFP-tubulin). After a 30-min pharmacological treatment, we used Airyscan confocal microscopy to acquire timelapses of YFP-tubulin (3 µm z-stacks, every 50-100 s for 30-70 min, Movie S8). Compared to controls, 500 nM nocodazole destabilized microtubules (presence of depolymerized YFP-tubulin in the cytosol, see arrows in Figure 4-S1F-G) and 25 µM taxol dramatically stabilized microtubules (indicated by long, rigid microtubules, see arrowheads in Figure 4-S1F,H) in pLL hair cells. We did still observe a subset of apical microtubules after nocodazole treatment, indicating that this population is particularly stable (see asterisks in Figure 4-S1F-H).”

      To further address concerns about verifying the efficacy of nocodazole and taxol treatment on microtubules, we added a quantification of our immunostaining data comparing the mean acetylated-a-tubulin intensities between control, nocodazole and taxol-treated hair cells. Our results show that nocodazole treatment reduces the mean acetylated-a-tubulin intensity in hair cells. This is included as a new figure (Figure 4-S1D-E) and this result is referred to in the text. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells after overnight treatment with nocodazole and taxol (Figure 4-S1A-C).

      “After a 16-hr treatment with 250 nM nocodazole we observed a decrease in acetylated-a-tubulin label (qualitative examples: Figure 4A,C, Figure 4-S1A-B). Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D). Less acetylated-a-tubulin label indicates that our nocodazole treatment successfully destabilized microtubules.”

      “Qualitatively more acetylated-a-tubulin label was observed after treatment, indicating that our taxol treatment successfully stabilized microtubules (qualitative examples: Figure 4-S1A,C). Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1E).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript is fairly dense. For instance, some information is repeated (page 3 ribbon synapses form along a condensed timeline in zebrafish hair cells: 12-18 hrs, and on .page 5. These hair cells form 3-4 ribbon synapses in just 12-18 hrs). Perhaps, the authors could condense some of the ideas? The introduction could be shortened.

      We have eliminated this repeated text in our revision. We have shortened the introduction 1275 to 1038 words (with references)

      (2) The mechanosensory structure on page 5 is not defined for readers outside the field.

      Great point, we have added addition information to define this structure in the results:

      “We staged hair cells based on the development of the apical, mechanosensory hair bundle. The hair bundle is composed of actin-based stereocilia and a tubulin-based kinocilium. We used the height of the kinocilium (see schematic in Figure 1B), the tallest part of the hair bundle, to estimate the developmental stage of hair cells as described previously…”

      (3) Figure 1E is quite interesting but I'd rather show Figure S1 B/C as they provide statistics. In addition, the authors define 4 stages : early, intermediate, late, and mature for counting but provide only 3 panels for representative examples by mixing late/mature.

      We were torn about which ribbon quantification graph to show. Ultimately, we decided to keep the summary data in Figure 1E. This is primarily because the supplementary Figure will be adjacent to the main Figure in the Elife format, and the statistics will be easy to find and view.

      Figure 1 now provides a representative image for both late and mature hair cells.

      (4.) The ribbon that jumps from one microtubule to another one is eye-catching. Can the authors provide any statistics on this (e.g. percentage)?

      Good point. In our revision, we have added quantification for these events. We observe 2.8 switching events per neuromast during our fast timelapses. This information is now in the text and is also shown in a graph in Figure 3-S1D.

      “Third, we often observed that precursors switched association between neighboring microtubules (2.8 switching events per neuromast, n= 10 neuromasts; Figure 3-S1C-D, Movie S7).”

      (5) With regard to acetyl-a-tub immunocytochemistry, I would suggest obtaining a profile of the fluorescence intensity on a horizontal plane (at the apical part and at the base).

      (6) Same issue with microtubule destruction by nocodazole. Can the authors provide fluorescence intensity measurements to convince readers of microtubule disruption for long and short-term application.

      Regarding quantification of microtubule disruption using nocodazole and taxol. We did attempt to create profiles of the acetylated tubulin or YFP-tubulin label along horizontal planes at the apex and base, but the amount variability among cells and the angle of the cell in the images made this type of display and quantification challenging. In our revision we as stated above in our response to Reviewer #1’s public comment, we have added representative side-view images to show the disruptions to microtubules more clearly after short and long-term drug experiments (Figure 4-S1A-C, F-H). In addition, we quantified the reduction in acetylated tubulin label after overnight treatment with nocodazole and found the signal was significantly reduced (Figure 3-S1D-E). Unfortunately, we were unable to do a similar quantification due to the variability in YFP-tubulin intensity due to variations in mounting. The following text has been added to the results:

      “Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D).”

      “Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1A,C,E).”

      (7) It is a bit difficult to understand that the long-term (overnight) microtubule destabilization leads to a reduction in the number of synapses (Figure 4F) whereas short-term (30 min) microtubule destabilization leads to the opposite phenotype with an increased number of ribbons (Figure 6G). Are these ribbons still synaptic in short-term experiments? What is the size of the ribbons in the short-term experiments? Alternatively, could the reduction in synapse number upon long-term application of nocodazole be a side-effect of the toxicity within the hair cell?

      Agreed-this is a bit confusing. In our revision, we have changed our analyses, so the comparisons are more similar between the short- and long-term experiments–we examined the number of ribbons and precursor per cells (apical and basal) in both experiments (Changed the panel in Figure 4G, Figure 4-S2G and Figure 5G). In our live experiments we cannot be sure that ribbons are synaptic as we do not have a postsynaptic co-label. Also, we are unable to reliably quantify ribbon and precursor size in our live images due to variability in mounting. We have changed the text to clarify as follows:

      Results:

      “In each developing cell, we quantified the total number of Riba-TagRFP puncta (apical and basal) before and after each treatment. In our control samples we observed on average no change in the number of Riba-TagRFP puncta per cell (Figure 6G). Interestingly, we observed that nocodazole treatment led to a significant increase in the total number of Riba-TagRFP puncta after 3-4 hrs (Figure 6G). This result is similar to our overnight nocodazole experiments in fixed samples, where we also observed an increase in the number of ribbons and precursors per hair cell. In contrast to our 3-4 hr nocodazole treatment, similar to controls, taxol treatment did not alter the total number of Riba-TagRFP puncta over 3-4 hrs (Figure 6G). Overall, our overnight and 3-4 hr pharmacology experiments demonstrate that microtubule destabilization has a more significant impact on ribbon numbers compared to microtubule stabilization.”

      Discussion:

      “Ribbons and microtubules may interact during development to promote fusion, to form larger ribbons. Disrupting microtubules could interfere with this process, preventing ribbon maturation. Consistent with this, short-term (3-4 hr) and long-term (overnight) nocodazole increased ribbon and precursor numbers (Figure 6AG; Figure 4G), suggesting reduced fusion. Long-term treatment (overnight) resulted in a shift toward smaller ribbons (Figure 4H-I), and ultimately fewer complete synapses (Figure 4F).”

      Nocodazole toxicity: in response to Reviewer # 2’s public comment we have added the following text in our discussion:

      Discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30 min to 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      (8) Does ribbon motion depend on size or location?

      It is challenging to reliability quantify the actual area of precursors in our live samples, as there is variability in mounting and precursors are quite small. But we did examine the location of ribbon precursors (using tracks > 1 µm as these tracks can easily be linked to cell location in Imaris) with motion in the cell. We found evidence of ribbons with tracks > 1 µm throughout the cell, both above and below the nucleus. This is now plotted in Figure 3M. We have also added the following test to the results:

      “In addition, we examined the location of precursors within the cell that exhibited displacements > 1 µm. We found that 38.9 % of these tracks were located above the nucleus, while 61.1 % were located below the nucleus (Figure 3M).”

      Although this is not an area or size measurement, this result suggests that both smaller precursors that are more apical, and larger precursors/ribbons that are more basal all show motion.

      (9) The fusion event needs to be analyzed in further detail: when one ribbon precursor fuses with another one, is there an increase in size or intensity (this should follow the law of mass conservation)? This is important to support the abstract sentence "ribbon precursors can fuse together on microtubules to form larger ribbons".

      As mentioned above it is challenging accurately estimate the absolute size or intensity of ribbon precursors in our live preparation. But we did examine whether there is a relative increase in area after ribbon fuse. We have plotted the change in area (within the same samples) for the two fusion events in shown in Figure 8-S1A-B. In these examples, the area of the puncta after fusion is larger than either of the two precursors that fuse. Although the areas are not additive, these plots do provide some evidence that fusion does act to form larger ribbons. To accompany these plots, we have added the following text to the results:

      “Although we could not accurately measure the areas of precursors before and after fusion, we observed that the relative area resulting from the fusion of two smaller precursors was greater than that of either precursor alone. This increase in area suggests that precursor fusion may serve as a mechanism for generating larger ribbons (see examples: Figure 8-S1A-B).”

      Because we were unable to provide more accurate evidence of precursor fusion resulting in larger ribbons, we have removed this statement from our abstract and lessened our claims elsewhere in the manuscript.

      (10) The title in Figure 8 is a bit confusing. If fusion events reflect ribbon precursors fusion, it is obvious it depends on ribbon precursors. I'd like to replace this title with something like "microtubules and kif1aa are required for fusion events"

      We have changed the figure title as suggested, good idea.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1C. The purple/magenta colors are hard to distinguish.

      We have made the magenta color much lighter in the Figure 1C to make it easier to distinguish purple and magenta.

      (2) There are places where some words are unnecessarily hyphenated. Examples: live-imaging and hair-cell in the abstract, time-course in the results.

      In our revision, we have done our best to remove unnecessary hyphens, including the ones pointed out here.

      (3) Figure 4H and elsewhere - what is "area of Ribeye puncta?" Related, I think, in the Discussion the authors refer to "ribbon volume" on line 484. But they never measured ribbon volume so this needs to be clarified.

      We have done best to clarify what is meant by area of Ribeye puncta in the results and the methods:

      Results:

      “We also observed that the average of individual Ribeyeb puncta (from 2D max-projected images) was significantly reduced compared to controls (Figure 4H). Further, the relative frequency of individual Ribeyeb puncta with smaller areas was higher in nocodazole treated hair cells compared to controls (Figure 4I).”

      Methods:

      “To quantify the area of each ribbon and precursor, images were processed in a FIJI ‘IJMacro_AIRYSCAN_simple3dSeg_ribbons only.ijm’ as previously described (Wong et al., 2019). Here each Airyscan z-stack was max-projected. A threshold was applied to each image, followed by segmentation to delineate individual Ribeyeb/CTBP puncta. The watershed function was used to separate adjacent puncta. A list of 2D objects of individual ROIs (minimum size filter of 0.002 μm2) was created to measure the 2D areas of each Ribeyeb/CTBP puncta.”

      We did refer to ribbon volume once in the discussion, but volume is not reflected in our analyses, so we have removed this mention of volume.

      (4) More validation data showing gene/protein removal for the crispants would be helpful.

      Great suggestion. As this is a relatively new method, we have created a figure that outlines how we genotype each individual crispant animal analyzed in our study Figure 6-S1. In the methods we have also added the following information:

      “fPCR fragments were run on a genetic analyzer (Applied Biosystems, 3500XL) using LIZ500 (Applied Biosystems, 4322682) as a dye standard. Analysis of this fPCR revealed an average peak height of 4740 a.u. in wild type, and an average peak height of 126 a.u. in kif1aa F0 crispants (Figure 6-S1). Any kif1aa F0 crispant without robust genomic cutting or a peak height > 500 a.u. was not included in our analyses.”

      Reviewer #3 (Recommendations For The Authors):

      Lines 208-209--should refer to the movie in the text.

      Movie S1 is now referenced here.

      It would be helpful if the authors could analyze and quantify the effect of nocodozole and taxol on microtubules (movie 7).

      See responses above to Reviewer #1’s similar request.

      Figure 7 caption says "500 mM" nocodozole.

      Thank you, we have changed the caption to 500 nM.

      One problem with the MSD analysis is that it is dependent upon fits of individual tracks that lead to inaccuracies in assigning diffusive, restricted, and directed motion. The authors might be able to get around these problems by looking at the ensemble averages of all the tracks and seeing how they change with the various treatments. Even if the effect is on a subset of ribeye spots, it would be reassuring to see significant effects that did not rely upon fitting.

      We are hesitant to average the MSD tracks as not all tracks have the same number of time steps (ribbon moving in and out of the z-stack during the timelapse). This makes it challenging for us to look at the ensembles of all averages accurately, especially for the duration of the timelapse. This is the main reason why added another analysis, displacements > 1µm as another readout of directional motion, a measure that does not rely upon fitting.

      The abstract states that directed movement is toward the synapse. The only real evidence for this is a statement in the results: "Of the tracks that showed directional motion, while the majority move to the cell base, we found that 21.2 % of ribbon tracks moved apically." A clearer demonstration of this would be to do the analysis of Figure 2G for the ribeye aggregates.

      If was not possible to do the same analysis to ribbon tracks that we did for the EB3-GFP analysis in Figure 2. In Figure 2 we did a 2D tracking analysis and measured the relative angles in 2D. In contrast, the ribbon tracking was done in 3D in Imaris not possible to get angles in the same way. Further the MSD analysis was outside of Imaris, making it extremely difficult to link ribbon trajectories to the 3D cellular landscape in Imaris. Instead, we examined the direction of the 3D vectors in Imaris with tracks > 1µm and determined the direction of the motion (apical, basal or undetermined). For clarity, this data is now included as a bar graph in Figure 3L. In our results, we have clarified the results of this analysis:

      “To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells and 203 tracks). Of the tracks with displacement > 1 µm, the majority of ribbon tracks (45.8 %) moved to the cell base, but we also found a subset of ribbon tracks (20.8 %) that moved apically (33.4 % moved in an undetermined direction) (Figure 3L).”

      Some more detail about the F0 crispants should be provided. In particular, what degree of cutting was observed and what was the criteria for robust cutting?

      See our response to Reviewer 2 and the newly created Figure 6-S1.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      *We would like to thank all the reviewers for their positive comments and valuable feedback. In addition, we would like to address reviewer 1 query on novelty, which was not questioned by the other 2 reviewers. Our study uncovered two main aspects of hypoxia biology: first we addressed the role of NF-kappaB contribution towards the transcriptome changes in hypoxia, and second, this revealed a previously unknown aspect, that NF-kappaB is required for gene repression in hypoxia. While we know a lot about gene induction in hypoxia, much less is known about repression of genes. In times of energy preservation, gene repression is as important as gene induction. *

      .

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The work from Shakir et al uses different cell line models to investigate the role of NF-kB in the transcriptional adaptation of cells to hypoxia, which is relevant. In addition, the manuscript contains a large amount of data that could be of interest and even useful for researchers in the field of hypoxia and NF-kB. However, in my opinion, there are several concerns that should be revised and additional experiments that could be included to strengthen the relevance of the work.

      We thank this reviewer for their positive comments.

      Specific issues: In Figure 1A, the authors examine which of the genes induced by hypoxia require NF-kB by RNA sequencing analysis of cells knocked down for specific NF-kB subunits and exposed to hypoxia for 24 hours. The knockdown is about 40-60% at the RNA level, but it would be helpful to show the effect of knockdown at the protein level.

      We agree with this and have added Western blot data (Sup. Figure S1F), which shows the effects of the siRNA are much more pronounced at the protein level.

      All the data regarding genes induced by hypoxia in control or NF-kB siRNA-treated cells are somewhat confusing. If I understand correctly, when the data from the three different siRNAs are crossed, only 1070 genes are upregulated and 295 are downregulated in an NF-kB-independent manner. If this is the case, I think it would be easier to use this information in Figure 2 to define the hypoxia-induced genes that are NF-kB-dependent by simply considering those induced in the control that are not in the NF-kB-independent subset (rather than repeating the integration of the data without additional explanation). If the authors do this simple analysis, are the resulting genes the same or similar? In any case, the way these numbers are obtained should be shown more clearly (i.e., a new Venn diagram showing genes up- or down-regulated in the siRNA control that are not up- or down-regulated in any of the siRNA-NF-kB treatments).

      Figure 1 shows the effects on gene expression of hypoxia in control and NF-____k____B ____subunit____-depleted cells compared to normoxia control cells. Figures 1F/1G compares genes up/downregulated in hypoxia when RelA, RelB, and cRel are depleted, compared to normoxia control. Figure 1 does not display N____F-____k____B____-dependent/independent hypoxia-responsive genes____, but rather the overall effect of siRNA control and siNF-____k____B treatments in hypoxia, compared to siRNA control in normoxia. Figure 2 then defines NF-____k____B-dependent ____and independent hypoxia-responsive genes. We actually define these exactly as the reviewer suggested and agree that we should show the way these numbers are obtained more clearly. We have added the suggested Venn diagrams (Sup. Figure S2) and added extra information to the methods section (page 5 of revised manuscript). We felt it was important to show all the data upfront in Figure 1 and then integrate and focus on NF-____k____B-dependent ____hypoxia-induced genes in Figure 2.

      Figure 2H shows that approximately 80% of the NF-kB-dependent genes up- or down-regulated in hypoxia were identified as RelA targets, which is statistically significant compared to RelB or cRel targets. However, what is the proportion of genes identified as RelA targets in the subset of NF-kB-independent hypoxia-induced genes? And in a randomly selected set of 500-600 genes? In my opinion, this statistical analysis should be included to demonstrate a relationship between NF-kB recruitment and hypoxia-induced upregulation (expected) and downregulation (unexpected). In this context, it is surprising that HIF consensus sites are preferentially detected in the genes that are supposed to be NF-kB dependent instead of RelA consensus.

      We thank the reviewer for this question, which is really helpful. The way we have displayed the stars on the graph for Figure 2H was slightly misleading we realize now. As such, we have amended the graph. RelA, RelB, and cRel bound genes (from the ChIP atlas) are all significantly enriched within our N____F-____k____B-dependent hypoxia-responsive genes, there is no statistical testing between RelA bound vs RelB bound or cRel bound. We have also performed this analysis on the NF-____k____B____-independent hypoxia-responsive genes ____and see the same trend (Sup. Figure S5B). This indicates that the enrichment of Rel binding sites from the ChIP atlas is not specific to NF-____k____B____-dependent hypoxia-responsive genes____. We have moved Figure 2H to (Sup. Figure S5A) and amended our description of the finding. This showcases how DNA binding does not necessarily mean functionality. We have amended our description of this result and limitation of the study.

      Figure 3 is just a confirmation by qPCR of the data obtained in the RNA-seq analysis, which in my opinion should be included as supplementary information. Moreover, both the effects of hypoxia and reversion by RelB siRNA are modest in several of the genes tested. The same is true for Figures 4 and 5 with very modest and variable results across cell types and genes.

      We appreciate this comment; we would like to keep this as a main figure for full transparency and show validation of our RNA-sequencing results.

      Figure 6 shows the effect of NF-kB knockdown on the induction of ROS in response to hypoxia. In the images provided, the effect of hypoxia is minimal in control cells, with the only clear differences shown in RelA-depleted cells.

      The quantification of the IF data (Figure 6B) shows ROS induction in hypoxia which is reduced in Rel-depleted cells, with RelA depletion having the strongest effect. ROS generation in hypoxia, although counterintuitive, is well documented and used for important signalling events. We believe our data supports the previously reported levels of ROS induction (reviewed in {Alva, 2024}) in hypoxia and importantly, that NF-____k____B depletion can at least partially____ reverse this.

      In 6B it is not clear what the three asterisks in the normoxia control represent (compared to the hypoxia siRNA control?). This should be clarified in the figure legend or text.

      We apologize for the lack of clarity we have now added this information to the figure legend.

      In the Western blot of 6C, there are no differences in the levels of SOD1 after RelA depletion. Again, there is no reason not to include the NF-kB subunits in the Western blot analysis.

      We have added the Western blot analysis to this figure. We were trying to simplify it. Although depletion of RelA does not rescue the hypoxia-induced repression of SOD1, depletion of RelB does. Furthermore, cRel although not statistically significant, has a trend for the rescue of this effect, see Figure 6C-D.

      Finally, regarding Figure 7, the authors mention that "we confirmed that hypoxia led to a reduction in several proteins represented in this panel (of proteins involved in oxidative phosphorylation), such as UQCRC2 and IDH1 (Figure 7A-B)". The authors cannot say this because it is not seen in the Western blot in 7A or in the quantification shown in 7B. In my personal opinion, stating something that is not even suggested in the experiments is very negative for the credibility of the whole message.

      We really do not agree with this comment. We do see reductions in the levels of the proteins we mentioned. We have made the figure less complex given that some proteins are very abundant while others are not. We hope the changes are now clear and apparent. We have changed the quantification normalisation to reflect this as well and modified our description of the results, see Figure 7 and Sup. Figure S18.

      In conclusion, this paper contains a large amount of relevant information, but i) non-essential data should be moved to Supplementary, ii) protein levels of relevant players need to be shown in addition to RNA, iii) minimal or undetectable differences need to be considered as no-differences, and iv) a model showing what is the interpretation of the data provided is needed to better understand the message of the paper. I mean, is it p65 or RelB binding to some of these genes leading to their activation or repression, or is it RelA or RelB inducing HIF1beta leading to NF-kB-dependent gene activation by hypoxia? If this were the case, experimental evidence that NF-kB regulates a subset of hypoxia genes through HIF1beta would make the story more understandable.

      We apologise but we do not know why the reviewer mentions HIF1beta. For gene induction, there is cooperation with the HIF system in some genes but not all. The most interesting and unexpected finding is that NF-kappaB is required for gene repression in hypoxia. We have added a new figure, investigating how HDAC inhibition could reverse the repression. A mechanism known to be employed by NF-kappaB when repressing genes. We have added all the blots for NF-kB, clarified the quantification and included other approaches including a CRISPR KO cell lines for both IKKs. We hope this is now clear.

      Reviewer #1 (Significance (Required)):

      The work presented here is interesting but does not provide a major advance over previous publications, the main message being that a subset of hypoxia-regulated genes are NF-kB dependent. However, there is no mechanistic explanation of how this regulation is achieved and several data that are not clearly connected. A more comprehensive analysis of the data and additional experimental validation would greatly enhance the significance of the work.

      We politely disagree with the reviewer. Our main finding is that NF-____k____B____ does play an important role in gene regulation in hypoxia but unexpectedly, this occurs mostly via gene repression. While there is vast knowledge on gene induction in hypoxia, gene repression, which typically does not occur directly via HIF, is virtually unknown. A previous study had identified Rest as a transcriptional repressor {PMID: 27531581} but this could only account for 20% of gene repression. Our findings reveal up to 60% of genes repressed in hypoxia require NF-____k____B____, hence this is a significant finding and a major advance over previous knowledge. Furthermore, we feel this paper is an excellent data resource for the field, as it is, to our knowledge, the first study characterising the extent to which NF-____k____B is required for hypoxia-induced gene changes, on a transcriptome-wide scale. Furthermore, we have validated this across multiple cell types and also used different approaches to investigate the role of NF-kB in the hypoxia transcriptional response. We are happy that the other reviewers agree with our novel findings.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors have interrogated the role of NF-kappaB in the cellular transcriptional response to hypoxia. While HIF is considered the master regulator of the cellular response to hypoxia, it has long been known that mutliple transcription factors also play a role both independently of HIF and through the regulation of HIF-1alpha levels. Chief amongst these is NF-kappaB, a regulator of cell death and inflammation amongst other things. While NF-kappB has been known to be activated in hypoxia through altered PhD activity, the impact of this on global gene expression has remained unclear and this study addresses this important question. Of particular interest, genes downregulated in hypoxia appear to be repressed in a NF-kappaB-dependent manner. Overall, this nice study reveals an important role for NF-kappaB in the control of the global cellular transcriptional response to hypoxia.

      We thank this reviewer for their positive comments.

      Reviewer #2 (Significance (Required)):

      Some questions for the authors to consider with experiments or discussion: -One caveat of the current study which should be discussed is that while interesting and extensive, the analysis is restricted to cancer cell lines which have dysfunctional gene expression systems which may differ from "normal" cells. This should be discussed.

      We thank the reviewer for these comments. This is indeed an important aspect, which we now expand on in the discussion section. We also took advantage of RNA-seq datasets for HUVECs (a non-transformed cell lines) in response to hypoxia (Sup. Figure S15), TNF-alpha with and without RelA depletion (Sup. Figure S16). These data support our findings that in hypoxia NF-kB is important for transcriptional repression, with some contributions to gene induction, even in a non-transformed cell system.

      In the publicly available data sets analyzed, were the same hypoxic conditions used as in this study. This information should be included.

      We apologize if this was not clear, the hypoxia RNA-seq studies are the same oxygen level and time (1%, 24 hours), this is in the legend of Figure 4A and Sup. Figure S9 and in Sup. Table S2. We have added this information to the main text also.

      • What is known about NF-kappaB as a transcriptional repressor in other systems such as the control of cytokine or infection driven inflammation? This is briefly discussed but should be expanded. This is important as a key question in the study of hypoxia is what regulates gene repression.

      We have included this in the discussion and also analysed available data in HUVECs in response to cytokine stimulation with and without RelA depletion (Sup. Figure S16). This analysis revealed equal importance of RelA for activation and repression of genes upon TNF-alpha stimulation. Around 40% of genes require RelA for their induction or repression in response to TNF-a. In the discussion we have also included other references where NF-kappaB has been found to repress genes.

      NF-kappaB has previously been shown to regulate HIF-1alpha transcription. What are the effects of NF-kappaB subunit siRNAs on basal HIF-1alpha transcription? In figure 7, it appears that NF-kappaB subunit siRNA is without effect on hypoxia-induced HIF protein expression. Could this account for some of the effects of NF-kappaB depletion on the hypoxic gene signature? This point needs to be clarified in light of the data presented.

      We have included data for HIF-1α RNA levels in HeLa cells with/without NF-____k____B____ depletion followed by 24 hours of hypoxia (Sup. Figure S20) and we see a small reduction (~10-20%). The reviewer is correct, there was not much effect of NF-____k____B____ depletion on HIF-1α protein levels following 24 hours hypoxia in HeLa cells. Effects of NF-kappaB depletion can be found usually with lower times of hypoxia exposure or when more than one subunit is depleted at the same time. We have added this as a discussion point in the revised manuscript.

      NRF-2 is a key cellular sensor of oxidative stress in a similar way to HIF being a hypoxia sensor. The authors demonstrate using a dye that ROS are paradoxically increased in hypoxia (a more controversial finding than the authors present). It would be of interest to know if NFR-2 is induced in hypoxia as a marker of cellular oxidative stress. Similarly, it would be interesting to determine by metabolic analysis whether oxidative phosphorylation (O2 consumption) is decreased as the transcriptional signature would suggest (although the difficulty of performing metabolic analysis in hypoxia is acknowledged).

      To investigate if NRF2 is induced, we performed a western blot at 0, 1, and 24 hours 1% oxygen, but didn’t see any induction of NRF2 protein levels (____Sup. Figure S17A). We also overlapped our hypoxia upregulated genes with NRF2 target genes from {PMID:24647116 and PMID: 38643749} (Sup. Figure S17B) and found limited evidence of NRF2 target genes being induced. Based on these findings, it seems that NRF2 is not being induced in hypoxia, at least not at the hypoxia level/time point we have analysed. We also agree it would be ideal to measure oxygen consumption in hypoxia, but unfortunately, we do not have the technical ability to do this at present.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Strengths This manuscript attempts to integrate multiple strands of data to determine the role of NFkB in hypoxia -induced gene expression. This analysis looks at multiple NFkB subunits in multiple cell lines to convincingly demonstrate that NFkB does indeed play a central role in the regulation of hypoxia-induced gene expression. This broad approach integrates new experimental data with findings from the published literature.

      A significant amount of work has been performed both experimentally and bioinformatically to test experimental hypotheses.

      We thank this reviewer for their positive comments.

      Limitations

      The main analysis in the paper involves comparing the impact of knocking down different NFkB family members in hypoxia and comparing transcriptional responses. I am surprised that the authors did not include the impact of knockdown of the NFkB family members in normoxia too. The absence of these control experiments allows us to understand the role of NFkB in hypoxia, but does not give us information as to how many of those impacts are specific/ induced in hypoxic conditions. i.e. many of the observed effects of NFkB knockdown could be due to basal suppression of NFkB target genes that happen to be hypoxia sensitive. This finding is obviously important, but it would be nice to know how many of those genes are only / preferentially regulated by NFkB in hypoxia. This would give a much deeper insight into the role of NFkB in hypoxia induced gene expression.

      We agree this would have been ideal. For financial reasons we limited our analysis to hypoxia samples. We have performed qPCR analysis depleting RelA, RelB and cRel under normal oxygen conditions in HeLa (Sup. Figure S8). We find that the majority of the validated genes in HeLa cells which require____ NF-____k____B for gene changes in hypoxia, are not regulated by N____F-____k____B under normal oxygen conditions____. We have also added this limitation into our discussion section.

      The broad experimental approach while a strength of the paper in many ways also has its limitations e.g. Motif analysis revealing e.g. HIF-1a binding site enrichment in RelA and RelB-dependent DEGs is correlative observation and does not prove HIF involvement in NFkB-dependent hypoxia induced gene activation. Comparing responses with responses seen in one cell type with responses that have been described in a database comprised of many studies in a variety of different cells also has some limitations. These points can be described more fully in the discussion

      We agree these are mere correlations and hence a limitation and we have not formerly tested the involvement of HIF. We have included this in the discussion as suggested. For HIF binding site correlation, we do also compare to HIF ChIP-seq in HeLa cells exposed to 1% oxygen, albeit at 8 hours and not 24 hours (Sup. Figure S4).

      For siRNA transfections, single oligonucleotide sequences were used for RelA, RelB and cRel. This increases the potential likelihood of 'off targets' compared to pooled oligos delivered at lower concentrations. This limitation should at least be mentioned.

      We agree and have now included this as a limitation in the discussion section. We have now also included analysis using wild type and 2 different IKK____________ double KO CRISPR cell lines generated in the following publication {PMID: 35029639}. Out of the 9 genes we identified as NF-____k____B-dependent hypoxia upregulated genes from HeLa cell RNA-seq and validated by qPCR, which are also hypoxia-responsive in HCT116 cells (Sup. Figure S11D), 6 displayed ____NF-____k____B dependence in HCT116 cells (Sup. Figure S14). We also provide new protein data in this cell system for oxidative phosphorylation markers, which show as with the siRNA depletion, rescue of repression of these proteins when NF-____k____B is inactivated.

      RNA-seq experiments are performed on n=2 data which means relatively low statistical power. How has the statistical analysis been performed on normalised counts (corresponding to 2 n- numbers) to yield statistical significance? I am not familiar with hypergeometric tests - please justify their use here.

      __*We use DESeq2 for differential expression analysis and filter for effect size (> -/+ 0.58 log2 fold change) and statistical significance (FDR I am not familiar with hypergeometric tests - please justify their use here.

      The hypergeometric test (equivalent to a one-sided Fisher's exact test) is routinely used to determine whether the observed overlap between two gene lists is statistically significant compared to what would be expected by chance. It is also the statistical test of choice for popular bioinformatics tools which perform over representation analysis (ORA) to see which gene sets/groups/pathways/ontologies are over-represented in a gene list, examples include Metascape, clusterProfiler, WebGestalt (used in this study), and gProfiler.

      P14 RelB is described as having the most widespread impact of hypoxia dependent gene changes across all cell systems tested. Could this be due to a more potent silencing of RelB and / or due to particularly high/ low expression of RelB in these cells in general?

      This is an excellent point, at the RNA level the RelB depletion is slightly more efficient (Sup. Figure S1), at the protein level, silencing is highly potent with all 3 siRNAs (Sup. Figure S1). We looked at the RNA levels of RelA, RelB and cRel in HeLa cells at basal conditions, and RelA shows the highest abundance compared to RelB and cRel, while RelB and cRel have similar expression levels (see below). However, RelB is very dynamic in response to hypoxia, something we have observed but have not published yet.

      P18 For western blot analysis best practise is to have 2 MW markers per blot presented

      We have and have added the second MW markers suggested.

      For quantification, I suggest avoiding performing statistical analysis on semi-quantitative data unless a dynamic range of detection (with standards) has been fully established.

      We agree this has many limitations, we will keep the quantification but moved into supplementary information.

      P19 There is clearly an effect of reciprocal silencing with the NFkB knockdown experiments ie. siRelA affects RelB levels in hypoxia and vice versa. The implications of this for data interpretation should be discussed.

      Indeed, it is well known that RelB and cRel are RelA targets. Less is known about RelA as it is not a known NF-____k____B____ target. We have added a discussion in the revised manuscript.

      P20 The literature can be better cited in relation RelB and hypoxia A brief search reveals a few papers that should be mentioned/ discussed. Oliver et al. 2009 Patel et al. 2017 Riedl et al. 2021

      We have looked into these suggestions. Oliver et al, refer to hypercapnia, not hypoxia and the other two only briefly mentioned RelB with no effects toward the goals of their studies. We have tried to incorporate what is currently known as much as possible.

      I suggest leaving out mention of IkBa sumoylation and supplementary figure 10. I'm not sure the data in the paper as a whole merits focus on this very specific point.

      We thank the reviewer for this suggestion and we have removed this aspect from the manuscript.

      There is a very strong reliance on mRNA and TPM data. Some additional protein data in support of key findings will enhance

      We have added additional protein level analysis where we could obtain antibodies, see Figures 6, 7 and Sup. Figures S17, S18, and S19 for our protein level analysis.

      A graphical abstract summarising key findings with exemplar genes highlighted will enhance.

      We have added a model to summarise our findings as suggested.

      Both HIF and NFKB are ancient evolutionarily conserved pathways. Can lessons be learned from evolutionary biology as to how NFkB regulation of hypoxia induced genes occured. Does the HIF pathway pre-date the NFkB pathway or vice versa. This approach could be valuable in supporting the findings from this study.

      We have investigated this. Unfortunately, there are very little available data on hypoxia gene expression in lower organisms. However, we have added a few sentences on the evolution of NF-____k____B____ and HIF.

      Minor comments P2 please briefly explain how 5 genes give rise to 7 proteins

      We have added this to the introduction as requested.

      P2 there seems to be some recency bias in the studies cited as being associated with NFkB activation in response to hypoxia. Mention of Koong et al (1994) and Taylor et al (1999) and other early papers in the field will enhance

      We have added these as suggested.

      P3 The role of PHD enzymes in the regulation of NFkB in hypoxia can be introduced and / or discussed

      We have added a reference to this aspect as suggested.

      P8 I suggest use of proportional Venn diagrams to demonstrate the patterns more clearly

      We have added these as suggested.

      P11 To what extent might NFkB and Rest co-operate/ co-regulate gene repression in hypoxia?

      This is a good question. We have overlapped our datasets with Rest-dependent hypoxia-regulated genes identified by Cavadas et al., (Figure below), and find that these appear to act independently of each other for the most part, with very few genes co-regulated by both.

      Reviewer #3 (Significance (Required)):

      Shakir et al. present a manuscript titled 'NFkB is a central regulator of hypoxia-induced gene expression'.

      The research group are experts in both NFkB and hypoxia signaling and are the ideal group to perform these studies.

      Hypoxia and inflammation are co-incident in many physiological and pathophysiological conditions, where the microenvironment affects disease severity and patient outcome. The cross talk between inflammatory and hypoxia signaling pathways is not fully described. Thus, this manuscript takes a novel approach to an established question and concludes clearly that NFkB is a central regulator of hypoxia-induced gene expression.

      We thank the reviewer for these positive comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (…) In my view, the part about NF-YA1 is less strong - although I realize this is a compelling candidate to be a regulator of cell cycle progression, the experimental approaches used to address this question falls a bit short, in particular, compared to the very detailed approaches shown in the rest of the manuscript. The authors show that the transcription factor NF-YA1 regulates cell division in tobacco leaves; however, there is no experimental validation in the experimental system (nodules). All conclusions are based on a heterologous cell division system in tobacco leaves. The authors state that NF-YA1 has a nodule-specific role as a regulator of cell differentiation. I am concerned the tobacco system may not allow for adequate testing of this hypothesis.

      Reviewer #1 makes a valid point by asking to focus the manuscript more explicitly on the role of NF-YA1 as a differentiation factor in a symbiotic context. We have now addressed this formally and experimentally.

      The involvement of A-type NF-Y subunits in the transition to the early differentiation of nodule cells has been documented in model legumes through several publications that we refer to in the revised version of the discussion (lines 617/623). We fully agree that the CDEL system, because it is heterologous, does not allow us more than to propose a parallel explanation for these observations - i.e_., that the Medicago NF-YA1 subunit presumably acts in post-replicative cell-cycle regulation at the G2/M transition. Considering your recommendations and those of reviewer #2, we sought to support this conclusion by testing the impact of localized over-expression of _NF-YA1 on cortical cell division and infection competence at an early stage of root colonization. The results of these experiments are now presented in the new Figure 9 and Figure 9-figure supplement 1-5 and described from line 435 to 495.

      With the fluorescent tools the authors have at hand (in particular tools to detect G2/M transition, which the authors suggest is regulated by NF-YA1), it would be interesting to test what happens to cell division if NF-YA1 is over-expressed in Medicago roots?

      To limit pleiotropic effects of an ectopic over-expression, we used the symbiosis-induced, ENOD11 promoter to increase NF-YA1 expression levels more specifically along the trajectory of infected cells. We chose to remain in continuity with the experiments performed in the CDEL system by opting for a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. The results obtained are presented in Figure 9B (quantification of split infected cells), in Figure 9-figure supplement 1B (ENOD11 expression profile), in Figure 9-figure supplement 3B (representative confocal images) and Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal). There, we show that mitosis remains inhibited in cells accommodating infection threads, but is completed in a higher proportion of outer cortical cells positioned on the infection trajectory, where ENOD11 gene transcription is active before their physical colonization.

      Based on NF-YA1 expression data published previously and their results in tobacco epidermal cells, the authors hypothesize that NF-YA regulates the mitotic entry of nodule primordial cells. Given that much of the manuscript deals with earlier stages of the infection, I wonder if NF-YA1 could also have a role in regulating mitotic entry in cells adjacent to the infection thread?

      The expression profile of NF-YA1 at early stages of cortical infection (Laporte et al., 2014) is indeed similar to the one of ENOD11 (as shown in Figure 9-figure supplement 1C) in wild-type Medicago roots, with corresponding transcriptional reporters being both activated in cells adjacent to the infection thread. Under our experimental conditions, additional expression of NF-YA1 (driven by the ENOD11 promoter) in these neighbouring cells did not impact their propensity to enter mitosis and to complete cell division. These results are presented in Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal) and Figure 9-figure supplement 5 (quantification of split neighbouring cells).

      Reviewer #1 (Recommendations For The Authors):

      - In the first part, images show the qualitative presence/absence of H3.1 or H3.3 histones.

      Upon closer inspection, many cells seem to have both histones. In Fig1-S1 for example (root meristem), it is evident that there are many cells with low but clearly present H3.1 content in the green channel; however, in the overlay, the green is lost and H3.3 (pink) is mainly visible. What does this mean in terms of the cell cycle? 

      We fully agree with reviewer #1 on these points. Independent of whether they have low or high proliferation potential, most cells retain histone H3.1 particularly in silent regions of the genome, while H3.3 is constitutively produced and enriched at transcriptionally active regions. When channels are overlaid, cells in an active proliferation or endoreduplication state (in G1, S or G2, depending on the size of their nuclei) will appear mainly "green" (H3.1-eGFP positive). Cells with a low proliferation potential (e.g., in the QC), G2-arrested (e.g., IT-traversed) or terminally differentiating (e.g., containing symbiosomes or arbuscules) will appear mainly "magenta" (H3.1-low, medium to high H3.3-mCherry content).

      Furthermore, all nodule images only display the overlay image, and individual fluorescence channels are not shown. Does the same masking effect happen here? It may be helpful to quantify fluoresce intensity not only in green but also in red channels as done for other experiments.

      Quantifying fluorescence intensity in the mCherry channel may indeed help to highlight the likely replacement of H3.1-eGFP by H3.3-mCherry in infected cells, as described by Otero and colleagues (2016) at the onset of cellular differentiation. However, the quantification method as established (i.e., measuring the corrected total nuclear fluorescence at the equatorial plane) cannot be applied, most of the time, to infected cells' nuclei due to the overlapping presence of mCherry-producing S. meliloti in the same channel (e.g., in Figure 2B). Nevertheless, and to avoid this masking effect when the eGFP and mCherry channels are overlaid, we now present them as isolated channels in revised Figures 1-3 and associated figure supplements. As the cell-wall staining is regularly included and displayed in grayscale, we assigned to both of them the Green Fire Blue lookup table, which maps intensity values to a multiple-colour sequential scheme (with blue or yellow indicating low or high fluorescence levels, respectively). We hope that this will allow a better appreciation of the respective levels of H3.1- and H3.3-fusions in our confocal images.

      - Fig 1 B - it is hard to differentiate between S. meliloti-mCherry and H3.3-mCherry. Is there a way to label the different structures?

      In the revised version of Figure 1B, we used filled or empty arrowheads to point to histone H3-containing nuclei. To label rhizobia-associated structures, we used dashed lines to delineate nodule cells hosting symbiosomes and included the annotation “IT” for infection threads. We also indicated proliferating, endoreduplicating and differentiating tissues and cells using the following annotations: “CD” for cell division, “En” for endoreduplication and “TD” for terminal differentiation. All annotations are explained in the figure legend.

      - Fig 1 - supplement E and F - no statistics are shown.

      We performed non-parametric tests using the latest version of the GraphPad Prism software (version 10.4.1). Stars (Figure 1-figure supplement 1F) or different letters (Figure 1-figure supplement 1G) now indicate statistically significant differences. Results of the normality and non-parametric tests were included in the corresponding Source Data Files (Figure 1 – figure supplement 1 – source data 1 and 2). We have also updated the compact display of letters in other figures as indicated by the new software version. The raw data and the results of the statistical analyses remain unchanged and can be viewed in the corresponding source files.

      - Fig 2 A - overview and close-up image do not seem to be in the same focal plane. This is confusing because the nuclei position is different (so is the infection thread position).

      We fully agree that our former Figure may have confused reviewers #1 and #2 as well as readers. Figure 2A was designed to highlight, from the same nodule primordium, actively dividing cells of the inner cortex (optical section z 6-14) and cells of the outer cortex traversed, penetrated by or neighbouring an infection thread (optical section z 11-19). We initially wanted to show different magnification views of the same confocal image (i.e_._, a full-view of the inner cortex and a zoomed-view of the outer layers) to ensure that audiences can identify these details. In the revised version of Figure 2A, we displayed these full- and zoomed-views in upper and lower panels, respectively and we removed the solid-line inset to avoid confusion. 

      - Fig 1A and Fig 2E could be combined and shown at the beginning of the manuscript. Also, consider making the cell size increase more extreme, as it is important to differentiate G2 cells after H3.1 eviction and cells in G1. You have to look very closely at the graph to see the size differences.

      We have taken each of your suggestions into account. A combined version of our schematic representation with more pronounced nuclei size differences is now presented in Figure 1A.

      - Fig. 3 C is difficult to interpret. Can this be split into different panels?

      We realized that our previous choice of representation may have been confusing. Each value corresponds only to the H3.1-eGFP content, measured in an infected cell and reported to that of the neighbouring cell (IC / NC) within individual root samples. Therefore, we removed the green-magenta colour code and changed the legend accordingly. We hope that these slight modifications will facilitate the interpretation of the results - namely, that the relative level of H3.1 increases significantly in infected cells in the selected mutants compared to the wild-type. This mode of representation also highlights that in the mutants, there are more individual cases where the H3.1 content in an infected cell exceeds that of the neighbouring cell by more than two times. These cases would be masked if the couples of infected cells and associated neighbours would be split into different panels as in Figure 3B.

      - Line 357/359. I assume you mean ...'through the G2 phase can commit to nuclear division'.

      We have edited this sentence according to your suggestion, which now appears in line 370. 

      Reviewer #2 (Recommendations For The Authors):

      Cell cycle control during the nitrogen-fixing symbiosis is an important question but only poorly understood. This manuscript uses largely cell biological methods, which are always of the highest quality - to investigate host cell cycle progression during the early stages of nodule formation, where cortical infection threads penetrate the nodule primordium. The experiments were carefully conducted, the observations were detail oriented, and the results were thought-provoking. The study should be supported by mechanistic insights. 

      (1) One thought provoked by the authors' work is that while the study was carried out at an unprecedented resolution, the relationship between control of the cell cycle and infection thread penetration remains correlative. Is this reduced replicative potential among cells in the infection thread trajectory a consequence of hosting an infection thread, or a prerequisite to do so?

      We understand and share the point of view of reviewer #2. At this stage, we believe that our data won’t enable us to fully answer the question, thus this relationship remains rather correlative. The reasons are that 1) the access to the status of cortical cells below C2 is restricted to fixed material and therefore only represents a snapshot of the situation, and 2) we are currently unable to significantly interfere with mechanisms as intertwined as cell cycle control and infection control. What we can reasonably suggest from our images is that the most favorable window of the cell cycle for cells about to be crossed by an infection thread is post-replicative, i.e., the G2 phase. Typical markers of the G2 phase were recurrently observed at the onset of physical colonization – enlarged nucleus, containing less histone H3.1 than neighbouring cells in S phase (e.g., in Figure 2A). Reaching the G2 phase could therefore be a prerequisite for infection (and associated cellular rearrangements), while prolonged arrest in this same phase is likely a consequence of transcellular passage towards a forming nodule primordium.

      More importantly, in either scenario, what is the functional significance of exiting the cell cycle or endocycle? By stating that "local control of mitotic activity could be especially important for rhizobia to timely cross the middle cortex, where sustained cellular proliferation gives rise to the nodule meristem" (Line 239), the authors seem to believe that cortical cells need to stop the cell cycle to prepare for rhizobia infection. This is certainly reasonable, but the current study provides no proof, yet. To test the functional importance of cell cycle exit, one would interfere with G2/M transition in nodule cells,  and examine the effect on infection.

      We fully agree with reviewer #2 that the functional importance of a cell-cycle arrest on the infection thread trajectory remains to be demonstrated. Interfering with cell-cycle progression in a system as complex and fine-tuned as infected legume roots certainly requires the right timing – at the level of the tissue and of individual cells; the right dose; and the right molecular player(s) (i.e., bona fide activators or repressors of the G2/M transition). Using the symbiosis-specific NPL promoter, activated in the direct vicinity of cortical infection threads (Figure 9-figure supplement 1B), we tried to force infectable cells to recruit the cell division program by ectopically over-expressing the Arabidopsis CYCD3.1, “mimicking” the CDEL system. So far, this strategy has not resulted in a significant increase in the number of uninfected nodules in transgenic hairy roots - though the effect on symbiosome release remains to be investigated. Provided that a suitable promoter-cell cycle regulator combination is identified, we hope to be able to answer this question in the future.

      Given that the authors have already identified a candidate, and showed it represses cell division in the CDEL system, not testing the same gene in a more relevant context seems a lost opportunity. If one ectopically expressed NY-YA1 in hairy roots, thus repressing mitosis in general, would more cells become competent to host infection threads? This seems a straightforward experiment and readily feasible with the constructs that the authors already have. If this view is too naive, the authors should explain why such a functional investigation does not belong in this manuscript.

      Reviewer #2's point is entirely valid, and we decided to address it through additional experiments. To avoid possible side effects on development by affecting cell division in general, we placed NF-YA1 under control of the symbiosis-induced ENOD11 promoter. Based on the results obtained in the CDEL system, the pENOD11::FLAG-NF-YA1 cassette was coupled to a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. Competence for transcellular infection was maintained upon local NFYA1 overexpression, the latter leading to a slight (non-significant) increase in the number of infected cells per cortical layer. These results are presented in Figure 9-figure supplement 3A-B (representative confocal images) and in Figure 9-figure supplement 4A-

      G.

      (1b) A related comment: on Line 183, it was stated that "The H3.1-eGFP fusion protein was also visible in cells penetrated but not fully passed by an infection thread". Presumably, the authors were talking about the cell marked by the arrowhead. But its H3.1-GFP signal looks no different from the cell immediately to its left. It is hard to say which cells are ones "preparing for intracellular infection pass through S-phase", and which ones are just "regularly dividing cortical cells forming the nodule primordium". What can be concluded is that once a cell has been fully transversed by an infection thread, its H3.1 level is low. Whether this is the cause or consequence of infection cannot be resolved simply by timing the appearance or disappearance of H3.1-GFP.

      We basically agree with comment 1b. In an unsynchronized system such as infected hairy roots, it is challenging to detect the event where a cell is penetrated, but not yet completely crossed by an infection thread. What we wanted to emphasize in Figure 2A, is that host cells in the path of an infection thread re-enter the cell cycle and pass through S-phase just as their neighbours do (as pointed out by reviewer #2 in his summary). The larger nucleus with slightly lower H3.1-eGFP signal than the neighbouring cell (as indicated by the use of the Green Fire Blue lookup table) suggests that the infected cell marked by the arrowhead in Figure 2A is actually in the G2 phase. The main difference is indeed that cells allowing complete infection thread passage exit the cell cycle and largely evict H3.1 while their neighbours proceed to cell division (as exemplified by PlaCCI reporters in Figure 4CD and the new Figure 5-figure supplement 2). Whether cell-cycle exit in G2 is a cause, or a consequence of cortical infection is a question that cannot be easily answered from fixed samples, which is a limitation of our study.

      (2) The authors have convincingly demonstrated that cortical cells accommodating infection threads exit the cell cycle, inhibit cell division, and down-regulate KNOLLE expression. How do these observations reconcile with the feature called the pre-infection thread? The authors devoted one paragraph to this question in the Discussion, but this does seem sufficient given that the pre-infection thread is a prominent concept. Is the resemblance to the cell division plane superficial, or does it reflect a co-option of the normal cytokinesis machinery for accommodating rhizobia?

      From our point of view, cortical cells forming pre-infection threads are likely in an intermediate state. PIT structures undoubtedly share many similarities with cells establishing a cell division plane. The recruitment of at least some of the players normally associated with cytokinesis has been demonstrated and is consistent with the maintenance of infectable cells in a pre-mitotic phase in Medicago, as discussed in lines 558 to 568. We nevertheless think that the arrest of the cell cycle in the G2 phase, presumably occurring in crossed cortical cells, constitutes an event of cellular differentiation and specialization in transcellular infection. 

      The following are mainly points of presentation and description: 

      (3) Line 158: I can't see "subnuclear foci" in Figure 1-figure supplement 1C-E. However, they are visible in Fig. 1C.

      We hope that presenting the eGFP and mCherry channels in separate panels and assigning them the Green Fire Blue colour scheme provides better visibility and contrast of these detailed structures. We now refer to Figure 1C in addition to Figure 1–figure supplement 1E in the main text (line 161). 

      (4) Line 160: The authors should outline a larger region containing multiple QC cells, rather than pointing to a single cell, as there are other areas in the image containing cells with the same pattern.

      We updated Figure 1-figure supplement 1E accordingly.

      (5) Fig. 1B should include single channels, since within a single plant cell, the nucleus, the infection thread, and sometimes symbiosomes all have the same color. This makes it hard to see whether the nuclei in these cells are less green, or are simply overwhelmed by the magenta color.

      To improve the readability of Figure 1B and to address suggestions from individual reviewers, we now include separate channels and have annotated the different structures labeled by mCherry.

      (6) Fig. 2A: the close-up does not match the boxed area in the left panel. Based on the labeling, it seems that the two panels are different optical sections. But why choose a different optical depth for the left panel? This can be disorienting to the author, because one expects the close-up to be the same image, just under higher magnification.

      We fully agree that our previous choice of representation may have been confusing. As we also specified to reviewer #1, we wanted to show a full-view of proliferating cells in the inner cortex and a zoomed-view of infected cells in the outer layers of the same nodule primordium. In the revised version of Figure 2A, we displayed these full- and zoomedviews in separate panels and removed the boxed area to avoid confusion. 

      (7) Figure 2-figure supplement 1B: the cell indicated by the empty arrowhead has a striking pattern of H3.1 and H3.3 distribution on condensed chromosomes. Can you comment on that?

      Reviewer #2 may be referring to the apparent enrichment of H3.3 at telomeres, previously described in Arabidopsis, while pericentromeric regions are enriched in H3.1. This distribution is indeed visible on most of the condensed chromosomes shown in Figure 2-figure supplement 1B. We included this comment in the corresponding caption.

      (8) Fig. 4: It is not very easy to distinguish M phase. Can the authors describe how each phase is supposed to look like with the reporters?

      We agree with reviewer #2 and attempted to improve Figure 4, which is now dedicated to the Arabidopsis PlaCCI reporter. ECFP, mCherry, and YFP channels were presented separately and the corresponding cell-cycle phases (in interphase and mitosis) were annotated. The Green Fire Blue lookup table was assigned to each reporter to provide the best visibility of, for example, chromosomes in early prophase. We included a schematic representation corresponding to the distribution of each reporter, using the colors of the overlaid image to facilitate its interpretation.

      (9) Line 298: what is endopolyploid? This term is used at least three times throughout the manuscript. How is it different from polyploid?

      In the manuscript, we aimed to differentiate the (poly)ploidy of an organism (reflecting the number of copies of the basic genome and inherited through the germline) from endopolyploidy produced by individual somatic cells. As reviewed by Scholes and Paige, polyploidy and endopolyploidy differ in important ways, including allelic diversity and chromosome structural differences. In the Medicago truncatula root cortex for example, a tetraploid cell generated via endoreduplication from the diploid state would contain at most two alleles at any locus. The effects of endopolyploidy on cell size, gene expression, cell metabolism and the duration of the mitotic cell cycle are not shared among individual cells or organs, contrasting to a polyploid individual (Scholes and Paige, 2015).

      See Scholes, D. R., & Paige, K. N. (2015). Plasticity in ploidy : A generalized response to stress. Trends in Plant Science, 20(3), 165‑175. https://doi.org/10.1016/j.tplants.2014.11.007

      (10) Line 332: "chromosomes on mitotic figures" - what does this mean?

      Reviewer #2 is right to point out this redundant wording. Mitotic “figures” are recognized, by definition, based on chromosome condensation. We now use the term "mitotic chromosomes" (line 344).

      (11) Fig. 6A: could the authors consider labeling the doublets, at least some of them? I understand that this nucleus contains many doublets. However, this is the first image where one is supposed to recognize these doublets, and pointing out these features can facilitate understanding. Otherwise, a reader might think the image is comparable to nuclei with no doublets in the rest of the figure.

      Following this suggestion, five of these doublets are now labeled in Figure 7A (formerly Figure 6A).

  3. May 2025
    1. Act I is the Introduction, also known as the exposition. Here we are introduced to the “normal world.” Now, the normal world may exist in a far future on an interstellar starship, or it may be set in a suburban ranch house with a swing set in the back yard, but the audience will give us great latitude as we establish the definition of “normal.” In this act, we learn the rules that govern this world, and something about the characters that inhabit it. In the Hegelian dialectic, this is the “thesis.” Act II is the Conflict. This conflict is introduced through an “inciting incident,” an act that disrupts the normal world outlined in act I. The tension introduced during this incident grows throughout the second act. In the Hegelian dialectic, the second act is the “antithesis.” Act III is the Resolution. The conflict is resolved, and the world and the characters in it are revealed to have been changed. In the Hegelian dialectic, the third act is the “synthesis.”

      This also fits the logic of essay/article writing. I think the author forgot to mention that logical structure also contributes to the tension in writings as an important role. You should write something understandable with the tension to attract audience/keep them focused/ask questions spontaneously.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Review

      Manuscript number: RC-2024-02391

      Corresponding author(s): John Varga

      Dibyendu Bhattacharyya

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      Dear editor,

      We are pleased to submit a full revised version of the manuscript that addresses all the points raised by the reviewers. We have included new experiments and modified the text and figures based on the reviewers’ suggestions. We thank all the reviewers for their insightful feedback, which has significantly enhanced the quality of the manuscript. We are confident and optimistic that our improved manuscript will be accepted by the journal of our choice.

      This document is supposed to contain a few images, which were somehow missing after the processing through the manuscript submission path. For convenience we also included a PDF version of the response to reviewers.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      • To reliably quantify the ciliary length in different cell types, and in independent ciliary marker needs to be included for comparison and the ciliary base needs to be labeled (e.g., g-TUBULIN). This needs to combined with a non-biased, high-throughput analysis, e.g., CiliaQ, Response: As suggested, we compared primary cilia length measurements using antibodies against Arl13b and γ-tubulin. The comparison between healthy controls (HC) and systemic sclerosis (SSc) is presented in Supplementary Figure S1. No significant differences in primary cilia length were observed compared to our previous measurements. Cilia length was quantified using ImageJ version 1.48v (http://imagej.nih.gov/ij) with the maximum intensity projection (MIP) method and visualized through 3D reconstruction using the ImageJ 3D Viewer.

      • As mentioned in the study, TGFbhas been implicated to drive myofibroblast transition. Thus TGFb stimulate ciliary signaling in the presented primary cells? The authors should provide a read-out for TGFb signaling in the cilium (ICC for protein phosphorylation etc.). Furthermore, canonical ciliary signaling pathways have been suggested to act as fibrotic drivers, such as Hedgehog and Wnt signaling - does stimulation of these pathways evoke a similar effect? Response: Yes, TGF-β1 stimulates ciliary signaling in growth-arrested foreskin fibroblasts. Clement et al. (2013) showed that TGF-β1 induces p-SMAD2/3 at the ciliary base, followed by the nuclear translocation of p-SMAD2/3 after 90 minutes. To assess whether canonical ciliary signaling pathways influence primary cilia length, we treated foreskin fibroblasts with Wnt (#908-SH, R&D) and a Shh agonist (#5036-WN, R&D) at 100 ng/mL each for 24 hours. We did not observe any changes in primary cilia length under either condition. These data are shown here for reference but are not included in the manuscript.

      Clement, Christian Alexandro, et al. "TGF-β signaling is associated with endocytosis at the pocket region of the primary cilium." Cell reports 3.6 (2013): 1806-1814.

      • Does TGFbinduce cell proliferation? If yes, this would force cilium disassembly and, thereby, reduce ciliary length, which is independent of a "shortening" mechanism proposed by the authors. Response: Yes, TGF-β induces cell proliferation in fibroblasts (Lee et al., 2013; Liu et al., 2016). However, we did serum starvation to stop proliferation. In our study, we observed a few percentage of Ki67-positive cells under TGF-β treatment at 24 hours (Supplementary Figure S2C). However, cell proliferation mainly stopped after 48 hours. Typically, proliferating cells rarely display any PC or show very small puncta. In our case, we observe a significantly elongated PC structure (although shorter than that of untreated cells) under TGF-beta-treated conditions. Our results display that a majority of cells are not proliferating but still display PC shortening under TGF-β treatment, suggesting that PC shortening is not due to cell division-induced PC disassembly. TGF beta-induced PC shortening is also reported in another fibroblast type previously (Kawasaki et al., 2024).

      Kawasaki, Makiri, et al. "Primary cilia suppress the fibrotic activity of atrial fibroblasts from patients with atrial fibrillation in vitro." Scientific Reports 14.1 (2024): 12470.

      Lee, J., Choi, JH. & Joo, CK. TGF-β1 regulates cell fate during epithelial–mesenchymal transition by upregulating survivin. Cell Death Dis 4, e714 (2013). https://doi.org/10.1038/cddis.2013.244.

      Liu, Y. et al. TGF-β1 promotes scar fibroblasts proliferation and transdifferentiation via up-regulating MicroRNA-21. Sci. Rep. 6, 32231; doi: 10.1038/srep32231 (2016).

      • As PGE2 has been shown to signal through EP4 receptors in the cilium, is the restoration of primary cilia length due to ciliary signaling? Response: As per your suggestion, we measured cilia length in the presence and absence of the EP4 receptor antagonist (#EP4 Receptor Antagonist 1; #32722; Cayman Chemicals; 500 nM) with PGE2. Interestingly, we did not observe a change in cilia length between the PGE2 and TGFβ (with EP4 receptor antagonist) treatment groups, as shown in supplementary figure S3. We believe that PGE2 works with the EP2 receptor under our experimental conditions. Kolodsick et al., 2003, also observed that PGE2 inhibits myofibroblast differentiation via activation of EP2 receptors and elevations in cAMP levels in healthy lung fibroblasts.

      Kolodsick, Jill E., et al. "Prostaglandin E2 inhibits fibroblast to myofibroblast transition via E. prostanoid receptor 2 signaling and cyclic adenosine monophosphate elevation." American journal of respiratory cell and molecular biology 29.5 (2003): 537-544.

      • Primary cilia length is regulated by cAMP signaling in the cilium vs. cytoplasm - does cAMP signaling play a role in this context? PGE2 is potent stimulator of cAMP synthesis - does this underlie the rescue of primary cilia length? Response: Yes, cAMP levels are important for both myofibroblast dedifferentiation and cilia length elongation. Kolodsick et al., 2003 observed that PGE2 inhibits myofibroblast differentiation via activation of EP2 receptors and elevations in cAMP levels in healthy lung fibroblasts. In a parallel set of experiments, treatment with forskolin (a cAMP activator) also reduced α-SMA protein levels by 40%. Forskolin is also known to increase PC length.

      Kolodsick, Jill E., et al. "Prostaglandin E2 inhibits fibroblast to myofibroblast transition via E. prostanoid receptor 2 signaling and cyclic adenosine monophosphate elevation." American journal of respiratory cell and molecular biology 29.5 (2003): 537-544.

      • The authors describe that they wanted to investigate how aSMA impacted primary cilia length. They only provide a knock-down experiment and measured ciliary length, but the mechanistic insight is missing. How does loss of aSMA expression control ciliary length? Response: We measured acetylated α-tubulin levels in ACTA2 siRNA-treated cells compared to control-treated cells. Acetylated α-tubulin levels increased under ACTA2 siRNA-treated conditions, as shown in Figure 4D, and TPPP3 levels were also elevated (Figure S8A). Interestingly, TPPP3 levels negatively correlated with disease severity in SSc fibroblasts (r = -0.2701, p = 0.0183), and TPPP3 expression significantly reduced in SSc skin biopsies, as shown in Figures 6C and 6D. These results strengthen our hypothesis that microtubule polymerization and actin polymerization, while they counterbalance each other, also contrarily affect PC length. We agree that a much more detailed study is needed to extensively delineate the intricate homeostasis of the actin network and microtubule network in conjunction with fibrosis and primary cilia length. We have mentioned this in the discussion.

      • The authors used LiCl in their experiments, which supposedly control Hh signaling. Coming back to my second questions, is this Hh-dependent? And what is the common denominator with respect to TGFbsignaling? And how is this mechanistically connected to actin and microtubule polymerization? Response: We used Shh inhibitor (Cyclopamine hydrate #C4116 Sigma-Aldrich) in both SSc and foreskin fibroblasts (with and without TGFβ). We found that PC length is significantly increased and αSMA intensity is reduced in the Shh inhibitor treated group (data not included in the Manuscript)

      • How was the aSMA Mean intensity determined? Response: We quantified aSMA mean intensity using ImageJ, and the procedure has been added to the respective figure legend and materials and methods section under ‘Quantification of immunofluorescence’ (each point represents mean intensity from three randomly selected hpf/slide was performed using ImageJ).

      • Fig: 1D: Statistical test is missing in Figure Legend and presentation of the p-values for the left graph is confusing Response: We added statistical test information in Figure Legend.

      • Some graphs are presented {plus minus} SD and some {plus minus} SEM, but this is not correctly stated in the Material & Methods Part __Response: __We added information to the figure legend as well as in the Material & Methods section.

        • 4D&E: Statistical test is missing in Figure Legend* Response: We added it now.
      • In general, text should be checked again for spelling mistakes and sentences may be re-written to promote readability. In particular, this applies to the discussion. __Response: __We checked and corrected.

      • Figure Legends are not written consistently, information is missing (e.g., statistical tests, see above). __Response: __We carefully checked and added information accordingly.

      • Figures should be checked again, and all text should be the same size and alignment of images should be improved. __Response: __We checked and corrected.

      Significance

      The authors present a novel connection between the regulation of primary cilia length and fibrogenesis. However, the study generally lacks mechanistic insight, in particular on how TGFb signaling, aSMA expression, and ciliary length control are connected. The spatial organization of the proposed signaling components is also not clear - is this a ciliary signaling pathway? If so, how does it interact with cytoplasmic signaling and vice versa?

      Response: Thank you for your thoughtful and constructive feedback. We appreciate your recognition of the novelty of our study linking primary cilia length regulation to fibrogenesis. In our revised manuscript, we did provide a mechanistic insight, though. Our results suggest that during the fibrotic response, higher-order actin polymerization, along with microtubule destabilization resulting from tubulin deacetylation, drives the shortening of PC length. In contrast, PC length elongation via stabilization of microtubule polymerization mitigates the fibrotic phenotype in fibrotic fibroblasts. We agree that a deeper mechanistic understanding particularly regarding how TGFβ signaling, αSMA expression, and ciliary length control intersect is essential for fully elucidating the pathway. We also acknowledge the importance of clarifying the spatial organization of the signaling components and plan to incorporate such analyses in future studies.

      Reviewer #2

      *I found the paper to be rather muddled and its presentation made if somewhat difficult to follow. For example, the Figures are disorganised (Fig 1 is a great example of this) and there was reference to Sup data that appeared out of order (eg Sup Fig 2 appeared before Sup Fig 1 in the text). *

      Response: We carefully revised the manuscript and arranged the figures.

      *Images in a single figure should be the same size. Currently they are almost random and us different magnifications. Overall, the paper needs to be better organized. *

      Response: We carefully revised the manuscript and figures provided with same magnification.

      *I have some significant concerns about how the PC length data was generated. To my mind the length may be hard to determine from the type of images shown in the paper (which may represent the best images?). Some of the images presented appear to show shorter, fatter PCs in the cells from fibrosis cases. Is this real or is it some kind of artefact? Would a shorter, fatter PCs have a similar or larger surface area? What would be the consequence of this? *

      Response: Primary cilia length was measured with ImageJ1.48v (using maximum intensity projection (MIP) method and visualized by 3D reconstruction with the ImageJ 3D viewer. Each small dot represents the PC length from an individual cell, and each large dot represents the average of the small dots for one cell line.

      *I am confused as to exactly what is meant by matched healthy controls. Age, sex and ethnicity, where stated seem to be very variable? What are CCL210 fibroblasts? *

      Response: We appreciate this comment. This is correct. The age, sex, and ethnicity are not matched for the available healthy controls. We have corrected that in the text. CCL210 is a commercially available fibroblast cell line that was isolated from the lung of a normal White, 20-year-old, female patient.

      *What does a change in PC length signify? DO shot PC foe a cellular transition or are they a consequence of it? What would happen is you targeted PCs with a drug and that influenced the length on all cell types? Is the effect on PC fibroblast specific? *

      __Response: __Significance and regulation of PC length are greatly debated and investigated still. It appears that PC length signify different features in different cell types. Although these are very interesting questions but such experiments are beyond the scope of our present work.

      Minor concerns

      *Page 4 second paragraph. I think it should be clarified that it is this group who have suggested a link between PCs and myofibroblast transition? *

      __Response: __We agree with the reviewer and clarified it.

      *Page 4 second paragraph. The use of the word "remarkably' is a bit subjective. *

      __Response: __We agree with the reviewer and have removed it.

      *Reference 27 is a paper on multiciliogenesis rather than primary ciliogenesis. *

      __Response: __We agree with the reviewer and have removed it.

      Figure 1 panel D. Make the image with the same sized vertical scale

      __Response: __We have replaced it with a new Figure 1.

      Significance

      Reviewer #2 (Significance (Required)):

      To my mind this is a novel paper and the data presented in it may be of interest to the cilia community as well as to the fibrosis field. This could be considered to be a significant advance and I am unaware that other groups are actively working in this area.

      Presentation of the data in the current form does not instil confidence in the work.

      Response: ____Thank you for recognizing the novelty and potential significance of our work. We appreciate your comments and fully acknowledge the concern regarding the presentation of the data. We have carefully revised the manuscript and reorganized the figures to improve clarity and overall presentation.

      Reviewer #3

      Major comments:

      • Need to demonstrate if the fibrotic phenotypes seen are produced through a ciliary-dependent mechanism. For example, to see if LiCl effects on Cgn1 are through ciliary expression or by other mechanisms. To achieve that objective, The authors should repeat the experiments in cells with a knockdown or knockout of ciliary proteins such as IFT20, IFT88, etc. The same approach should be applied to the tubacin experiments. Response: We silenced foreskin fibroblasts with IFT88/IFT20, both in the presence and absence of TGF-β1, followed by treatment with LiCl and Tubacin. Both LiCl and Tubacin can rescue cilia length and mitigate the myofibroblast phenotype in the presence of silenced IFT88/IFT20 gene, as shown in supplementary figure S9. Our result suggests that LiCl and Tubacin functions are both independent of the IFT-mediated ciliary mechanism. Regulation of PC length is still an enigma and highly debated. Moreover, PC length can be affected in multiple ways and is not solely dependent on IFTs (Avasthi and Marshall, 2012). One such method is the direct modification of the axoneme by altering microtubule stability through the acetylation state (Avasthi and Marshall, 2012), a pathway most likely the case for Tubacin. Another mode of PC length regulation is through a change in Actin polymerization. The remodeling of actin between contractile stress fibers and a cortical network alters conditions that are hospitable to basal body docking and maintenance at the cell surface (Avasthi and Marshall, 2012), causing PC length variation. Our results suggest that PC length functions as a sensor of the status of the fibrotic condition, as evidenced by the aSMA levels of the cells.

      Avasthi, P., and W.F. Marshall. 2012. Stages of ciliogenesis and regulation of ciliary length. Differentiation. 83:S30-42.

      • The use of LiCl to increase ciliary length is complicated. What are the molecular mechanisms underlying this effect? It is known that it may be affecting GSK-3b, which can have other ciliary-independent effects. Therefore, using ciliary KO/KD cells (IFT88 or IFT20) as controls may help assess the specificity of the proposed treatments. Response: As explained in the previous paragraph, PC length regulations are dependent on multiple factors and many of them are not IFT dependent. One such method is directly modifying the axoneme by altering microtubule stability/polymerization through the acetylation state(Avasthi and Marshall, 2012), a pathway most likely the case for Tubacin. Another mode of PC length regulation is through a change in Actin polymerization. The remodeling of actin between contractile stress fibers and a cortical network alters conditions that are hospitable to basal body docking and maintenance at the cell surface (Avasthi and Marshall, 2012), causing PC length variation. Higher order microtubule polymerization inhibit actin polymerization. By interrogating RNA-seq data we determined that several PC-disassembly related genes (KIF4A, KIF26A, KIF26B, KIF18A), as well as microtubule polymerization protein genes (TPPP, TPPP3, TUBB, TUBB2A etc), were differentially expressed in LiCl-treated SSc fibroblasts (Suppl. Fig. S6D). Altogether, these findings suggest that microtubule polymerization/depolymerization mechanisms may regulate PC elongation and attenuation of fibrotic responses after either LiCl or Tubacin treatment.

      • Also, assessing the frequency of ciliary-expressing cells is important. That may give another variable important to predict fibrotic phenotypes. Or do 100% of the cultured cells express cilia in those conditions? Response: We carefully checked and observed almost 95% cells express cilia in cultured conditions.

      • Have the authors evaluated if TGF-b1 treatments induce cell cycle re-entry and proliferation in these experimental conditions? This is important to exclude ciliary resorption due to cell cycle re-entry instead of the myofibroblast activation process. __Response:__Yes, TGF-β induces cell proliferation in fibroblasts (Lee et al., 2013; Liu et al., 2016). However, we did serum starvation to stop proliferation. In our study, we observed a few percentage of Ki67-positive cells under TGF-β treatment at 24 hours (Supplementary Figure S2C). However, cell proliferation mainly stopped after 48 hours. Typically, proliferating cells rarely display any PC or show very small puncta. In our case, we observe a significantly elongated PC structure (although shorter than that of untreated cells) under TGF-beta-treated conditions. Our results display that a majority of cells are not proliferating but still display PC shortening under TGF-β treatment, suggesting that PC shortening is not due to cell division-induced PC disassembly. TGF beta-induced PC shortening is also reported in another fibroblast type previously (Kawasaki et al., 2024).

      Kawasaki, Makiri, et al. "Primary cilia suppress the fibrotic activity of atrial fibroblasts from patients with atrial fibrillation in vitro." Scientific Reports 14.1 (2024): 12470.

      Lee, J., Choi, JH. & Joo, CK. TGF-β1 regulates cell fate during epithelial–mesenchymal transition by upregulating survivin. Cell Death Dis 4, e714 (2013). https://doi.org/10.1038/cddis.2013.244.

      Liu, Y. et al. TGF-β1 promotes scar fibroblasts proliferation and transdifferentiation via up-regulating MicroRNA-21. Sci. Rep. 6, 32231; doi: 10.1038/srep32231 (2016).

      • The authors described that they focused on the genes that are affected in opposite ways (supp table 4), but TEAD2, MICALL1, and HDAC6 are not listed in that table. Response: The list in Supplementary Table S3 includes common genes defined as differentially expressed based on a fold change >1 or Minor comments:

      • Figure 1A,B,C should also show lower magnification images where several cells/field are visualized. Response: We have replaced it with a new Figure 1.

      • The number of patients analyzed is not clear. For example, M&M describes 5 healthy and 8 SSc, but only 3 and 4 are shown in the figure. Furthermore, for orbital fibrosis, 2 healthy vs. 2 TAO are mentioned in the figure legend, but only one of each showed. Finally, the healthy control for lung fibroblast seems to be 3 independent experiments of the CCL210 cell line; please show the three independent controls and clarify on the X-axis and in the figure legend that these are CCL210 cells. Response: A total of 5 healthy and 8 SSc skin explanted fibroblast cell lines were used, as described in the Materials and Methods. Since these are patient-derived skin fibroblasts, maintaining equal numbers in each experiment is challenging. Revised graphs for orbital fibroblasts and CCL210 have been added in the new Figures 1B and 1C.

      • For the same set of experiments, please clarify and consistently describe the conditions that promote PC: 12hs serum starvation as described in M&M? Or 24hs as described in the text? Or 16 as described in figure legend 1? Or 24hs as described in supp figure 2? Response: We serum-starved the cells overnight, and this is also mentioned in the manuscript.

      • Please confirm in figure legends and M&M that 100 cells per group were counted. Response: We measured only 100 cells per cell line in Supplementary Figure S1B. To eliminate any confusion, we have now created a superplot for cilia analysis. Each small dot represents the PC length from an individual cell, and each large dot represents the average of the small dots for one cell line. An unpaired two-tailed t-test was performed on the small dots (mean ± SD).

      • Figure 2 should also provide lower magnification to show several cells per field. Response: Foreskin fibroblasts treated with TGF-β1 are added in S2A.

      • How do you explain that the increase in length of primary cilia after siACTA2 doesn't change COL1A1? Wouldn't it be a good approach to also check by Western Blot? Response: We believe that depletion of aSMA was sufficient to reduce the PC length for the reason described earlier (Avasthi and Marshall, 2012), but was not sufficient enough to change COL1A1 level. We added the western blot in Supplementary Figure S8B.

      • Once more, figure 5 will benefit from low mag images. How consistent is the effect of LiCl in the cultured cells? What is the percentage of rescued cells? Response: LiCl treatment was consistent for almost all the cells (~95%) as shown below and added in S4A.

      • Figure 5, panels F and G need better explanation in the results text as well as in the figure legend. Response: We added now.

      • 9) Some figures/supp figures are wrongly referenced in the text. *

      __ Response:__ We carefully revised the manuscript and corrected the references.

      10) Figure 6, panel A is confusing. Is it a comparison between SSC skin fibroblasts and foreskin fibroblasts? Maybe show labels on the panel.

      __ Response:__ We updated the figure legend for Panel A in Figure 6.

      11) Where is Figure 8 mentioned in the text?

      __ Response:__ In the discussion section.

      12) The work will benefit from an initial paragraph in the discussion enumerating the findings and a summary of the conclusion at the end.

      Response: We agree and modified the discussion accordingly.

      13) The nintedanib experiments are not described in the results section at all.

      Response: All nintedanib experiments are now included in Figure S5C-F and are described in the Results section.

      Significance

      Reviewer #3 (Significance (Required)): Beyond the lack of in situ ciliary expression assessment, the work is exciting, and the potential implications of treating/preventing fibrosis with small molecules to modulate ciliary length could be transformative in the field. Furthermore, there are a few HDAC6 inhibitors already in clinical trials for different tumors, which increases the significance of the work.

      Response: Thank you for your encouraging comments regarding the potential impact of our findings. We agree that the therapeutic implications of modulating ciliary length, particularly using small molecules such as HDAC6 inhibitors already in clinical trials, could be transformative in the context of fibrosis. We also acknowledge the importance of in situ assessment of ciliary expression and plan to incorporate such analyses in future studies to further strengthen our findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalise these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed.

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it is even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model’s results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which so far has not been brought up when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting. This study, however, focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling. The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

      Summary from the Reviewing Editor:

      The authors present a simple one-dimensional biophysical model to describe the gliding motion and the observed statistics of trajectory reversals. However, the model does not capture some important experimental findings, such as the buckling occurring in long filaments, and the coupling between rotation, slime generation, and motion. More effort is recommended to integrate the information gathered on these different aspects to provide a more unified understanding of filament motility. In particular, the referees suggest performing a more quantitative analysis of the buckling in long filaments. Finally, it is also recommended to discuss the results in the context of previous literature, in order to better explain their relevance. Please find below the detailed individual recommendations of the three reviewers.

      We thank the editor for this accurate summary of the presented work and for highlighting the key points raised by the reviewers. We have provided below point-by-point replies to these.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The relevance of the study organism Fluctiforma draycotensis is not clearly explained, and the results are not discussed in the context of previous literature. The motivation would be clearer if the manuscript explained why this model organism was chosen and how the results compare with those previously observed for this or other organisms.

      We have extended the introduction and discussion sections to make it clearer why we have worked with this organism and how the findings from this work relate to previous ones. In brief, Flucitforma draycotensis is a useful organism to work with as it not only displays significant motility but it also displays intriguing collective behaviour at different scales. Previous works on gliding motility in filamentous cyanobacteria have mostly focussed on the model organism Nostoc punctiforme, which only displays motility after differentiation into hormogonia [1]. There have also been studies in a range of different filamentous species, including those of the non-monophyletic genus, Phormidium, but these studies mostly looked at effects of genetic deletions on motility [2] or utilised electron microscopy to identify proteins (or surface features) involved in motility [3-5]. It must be noted that motility is also described and studied in non-filamentous cyanobacteria, but the dynamics of motion and molecular mechanisms there are different to filamentous cyanobacteria [6,7]. These previous studies are now cited / summarised in the revised introduction and discussion sections.

      The inferred tracks, probably associated with secreted slime, play a key role since it is supposed that the tracks provide the external force that keeps the filaments straight. Movie S3, in phase contrast, provides convincing evidence for the tracks, but they cannot be seen in the fluorescence images presented in the main text. Clearer evidence of them should be shown in the main text. An especially important aspect of the tracks is where they start and end since the computational model assumes that reversal happens due to forces generated by reaching the end of a track. Therefore it seems important to comment on what produces the tracks, to check whether reversals actually happen at the end of a track, etc. Perhaps tracks could be strained with Concanavalin-A?

      To confirm that reversals happen on track ends, we have now performed an analysis on agar, where we can see tracks on phase microscopy. This analysis confirms that, on agar, reversals indeed happen on track ends. We added this analysis, along with images showing tracks clearly as a new Fig in the main text (see new Fig. 1).

      Further confirming the reversal at track ends, we note that filaments on circular tracks do not not reverse over durations longer than the ‘expected reversal interval’ of a filament on a straight track (see details in response to Reviewer 2).

      Regarding what produces the tracks on agar, we are still analysing this using different methods and these results will be part of a future study. Fluorescent staining can be used to visualise slime tubes using TIRF microscopy, as shown in Fig. S8, however, visualising tracks on agar using low magnification microscopy has been difficult due to background fluorescence from agar.

      We would also like to clarify that the model does not incorporate any assumptions regarding the track-filament interaction, other than that the track ends behave akin to a physical boundary for the filament. The observed reversal at track ends and “what” produces the track are distinct aspects of filament motion. We do not think that the model’s assumption of filament reversal at the end of the track requires understanding of the mechanism of slime production.

      Reviewer #3 (Recommendations for the authors):

      The manuscript combines three distinct topics: (1) the difference in locomotion on glass vs agar, (2) the development of a biophysical model, and (3) the helical motion of filament. It is not clear what insight one can gain from any one of these topics about the two others. The manuscript would be strengthened by more clearly connecting these three aspects of the work. A stronger comparison of theory to observation would be very useful. Some suggestions:

      (1) The observation that it is only the longest filaments that buckle is interesting. It should be possible to predict the critical length from the biophysical model. Doing so could allow fits of some model parameters.

      (2) What model parameters change between glass and agar? Can you explain these qualitative differences in motility by changing one model parameter?

      (3) Is it possible to exert a force on one end of a filament to see if it is really mechano-sensing that couples their motion?

      We thank the reviewer for this comment and agree with them that a better connection between model and experiment should be sought. We believe that the new analyses, presented below in response to the 2nd suggestion of the reviewer, provide such a connection in the context of reversal frequency. As stated below, we think that the 1st suggestion falls outside of the scope of the current work, but should form the basis of a future study.

      Regarding suggestion (1) - addressing buckling:

      We agree with the reviewer that using a model to predict a critical buckling length would be useful. We note, however, that the presented study focussed on cell-to-cell coupling / coordination during filament motility using a 1D, beadchain model. The buckling observations served, in this context, as evidence of cellular de-coordination. Now that we have observed buckling (and plectoneme formation), these processes need to be analysed with further experiments and modelling. The appropriate model for studying buckling would have to be at least 2D (ideally 3D) and consider elastic forces and torques relating to filament bending, rotation, and twisting. Experimentally, we need to identify means of influencing filament length and motion and undertake further measurements of buckling frequency and position across different filament lengths. These investigations are ongoing and will be summarised in a separate, future publication.

      Regarding suggestion (2) - addressing differences in motility on agar vs. glass:

      We believe that the two key differences between agar and glass experiments are the occasional detachment of filaments from substrate on glass and the lack of confining tracks on glass. These differences might arise from the interactions between the filament, the slime, and the surface. As both slime and agar contain polysaccharides, the slime-agar interaction can be expected to be different from the slime-glass interaction. Additionally, in the agar experiments, the filaments are confined between the agar and a glass slide, while they are not confined on the glass, leaving them free to lift up from the glass surface. We expect these factors to alter reversal frequency between the two conditions. To explore this possibility, we have now extended the analysis of experimental data from glass and present that (see details below):

      (i) dwell times are similar between agar and glass, and

      (ii) reversal frequency distribution is different between glass and agar, and remains constant across filament length on glass.

      We were able to explore these experimental findings with new model simulations, by removing the assumption of an “external bounding frame”. We then analysed reversal frequency within against model parameters, as detailed below.

      “The movement of the filaments on glass. We have extended our analysis of motility on glass resulting in the following noted features. Firstly, the median speed shows a weak positive correlation with filament length on glass (see original Fig S3B vs. updated Fig. S3A). This is slightly different to agar, where we do not observe any strong correlation in either direction (see original, Fig. 1 vs. updated Fig 2). Both the cases of positive, and no correlation, support our original hypothesis that the propulsion force is generated by multiple cells within the filament.

      Secondly, the filaments on glass display ‘stopping’ events that are not followed by a reversal, but are instead followed by a continuation in the original direction of motion, which we term ‘stop-go’ events, in contrast to the reversals. The dwell times associated with reversals and ‘stop-go’ events are similarly distributed (see original Fig S3A vs. updated Fig S3B). Furthermore, the dwell time distributions are similar between agar and glass (compare old Fig. 1C vs. new Fig 2C and new Fig. S3B). This suggests that the reversal process is the same on both agar and glass.

      Thirdly, we find that the frequencies of both reversal and stop-go events on glass are uncorrelated with the filament length (see new Fig. S4A) and there are approximately twice as many reversals as stop-go events. In contrast, the filaments on agar reverse with a frequency that is inversely proportional to the filament length (which is in turn proportional to the track length) (see original Fig. S1). The distribution of reversal frequencies on agar is broader and flatter than the distribution on glass (see new Fig. S4B). These findings are inline with the idea that tracks on agar (which are defined by filament length) dictate reversal frequency, resulting in the strong correlations we observe between reversal frequency, track length, and filament length. On glass, filament movement is not constrained by tracks, and we have a specific reversal frequency independent of filament length.”

      “Model can capture movement of filaments on glass and provides hypotheses regarding constancy of reversal frequency with length. We believe the model parameters controlling cellular memory (ω<sub>max</sub>) and strength of cellular coupling (K<sub>ω</sub>) describe the internal behaviour of a filament and therefore should not change depending on the substrate. Thus, we expect the model to be able to capture movement on glass just by removal of any ‘confining tracks’, i.e external forces, from the simulations. Indeed, we find that the model displays both stop-go and reversal events when simulated without any external force and can capture the dwell time distribution under this condition (compare new Figs. S12,S13 with S3).

      In terms of reversal frequency, however, the model shows a reduction in reversal frequency with filament length (see new Fig. S15). This is in contrast to the experimental data. We find, however, that model results also show a reduction in reversal frequency with increasing (ω<sub>max</sub> and K<sub>ω</sub> (see new Fig. S14 and S15). This effect is stronger with (ω<sub>max</sub>, while it quickly saturates with K<sub>ω</sub> (see new Fig. S14). Therefore, one possibility of reconciling the model and experiment results in terms of constant reversal frequency with filament length would be to assume that (ω<sub>max</sub> is decreasing with filament length (see new Fig. S16). Testing this hypothesis - or adding additional mechanisms into the model - will constitute the basis of future studies.”

      Regarding suggestion (3) - role of mechanosensing:

      We have tried several experiments to evaluate mechanosensing. First, we have used a micropipette or a thin wire placed on the agar, to create a physical barrier in the way of the filaments. The micropipette approach was not quite feasible in our current setup. The wire approach was possible to implement, but the wire caused a significant undulation / perturbation on agar. Possibly relating to this, filaments tended to continue moving alongside the wire barrier. Therefore, these experiments were inconclusive at this stage with regards to mechanosensing a physical barrier. As an alternative, we have attempted trapping gliding filaments using an optical trap with a far red laser that should not affect the physiology of the cells. This did not cause an immediate reversal in filament motion. However, this could be due to the optical trap strength being below the threshold value for mechanosensing. The force per unit length generated by filamentous cyanobacteria has been calculated via a model of self-buckling rods, giving a value of ≈1nN/μm [8]. In comparison, the optical trap generates forces on the scale of pN. Thus, the trap force is several orders of magnitude lower than the propulsive force generated by a filament, given filament lengths in the range of ten to several hundreds μm. We conclude that the lack of observed response may be due to the optical trap force being too weak.

      Thus, the experiments we can perform using our current available methods and equipment are not able to prove either the presence or the absence of mechanosensing in the filament. We plan to perform further experiments in this direction, involving new and/or improved experimental setups, such as use of Atomic Force Microscopy.

      We would like to note that there is an additional observation that supports the idea of reversals being mediated by mechanosensing at the end of a track, instead of the locations of the track ends being caused by the intrinsic reversal frequency of the filament. In a few instances (N = 4), filaments on agar ended up on a circular track (see Movie S4 for an example). These filaments did not reverse over durations a few times longer than the ‘expected reversal interval’ of a filament on a straight track.

      Should $N$ following eq 7 and in eq 9 be $N_f$?

      We have corrected this typo.

      It would be useful to include references to what is known about mechanosensing in cyanobacteria.

      We agree with the reviewer, and we have not updated the discussion section to include this information. Mechanosensing has not yet been shown directly in any cyanobacteria, but several species are shown to harbor genes that are implicated (by homology) to be involved in mechanosensing. In particular, analysis of cyanobacterial genomes predicts the presence of a significant number of homologues of the Escherichia coli mechanosensory ion channels MscS and MscL [9]. We have also identified similar MscS protein sequences in F. draycotensis. These channels open when the membrane tension increases, allowing the cell to protect itself from swelling and rupturing when subject to extreme osmotic shock. [10,11]

      We also note that F. draycotensis, as with other filamentous cyanobacteria, have genes associated with the type IV pili, which may be involved in the surface-based motility [1]. Type IV pili have been shown to be mechanosensitive. For example, in cells of Pseudomonas aeruginosa that ‘twitch’ on a surface using type IV pili, application of mechanical shear stress results in increased production of an intracellular signalling molecule involved in promoting biofilm production. The pilus retraction motor has been shown to be involved in this shear-sensing response [12]. Additionally, twitching P. aeruginosa cells often reverse in response to collisions with other cells. Reversal is also caused by collisions with inert glass microfibres, which suggests that the pili-based motility can be affected by a mechanical stimulus [13].

      References

      (1) D. D. Risser, Hormogonium Development and Motility in Filamentous Cyanobacteria. Appl Environ Microbiol 89, e0039223 (2023).

      (2) T. Lamparter et al., The involvement of type IV pili and the phytochrome CphA in gliding motility, lateral motility and photophobotaxis of the cyanobacterium Phormidium lacuna. PLoS One 17, e0249509 (2022)

      (3) E. Hoiczyk, Gliding motility in cyanobacteria: observations and possible explanations. Arch Microbiol 174, 11-17 (2000).

      (4) D. G. Adams, D. Ashworth, B. Nelmes, Fibrillar Array in the Cell Wall of a Gliding Filamentous Cyanobacterium. Journal of Bacteriology 181 (1999).

      (5) L. N. Halfen, R. W. Castenholz, Gliding in a blue-green alga: a possible mechanism. Nature 225, 1163-1165 (1970).

      (6) S. N. Menon, P. Varuni, F. Bunbury, D. Bhaya, G. I. Menon, Phototaxis in Cyanobacteria: From Mutants to Models of Collective Behavior. mBio 12, e0239821 (2021).

      (7) F. D. Conradi, C. W. Mullineaux, A. Wilde, The Role of the Cyanobacterial Type IV Pilus Machinery in Finding and Maintaining a Favourable Environment. Life (Basel) 10 (2020).

      (8) M. Kurjahn, A. Deka, A. Girot, L. Abbaspour, S. Klumpp, M. Lorenz, O. Bäumchen, S. Karpitschka Quantifying gliding forces of filamentous cyanobacteria by self-buckling. eLife 12:RP87450 (2024).

      (9) S.C. Johnson, J. Veres, H. R. Malcolm, Exploring the diversity of mechanosensitive channels in bacterial genomes. Eur Biophys J 50, 25–36 (2021).

      (10) S.I. Sukharev, W.J. Sigurdson, C. Kung, F. Sachs, Energetic and spatial parameters for gating of the bacterial large conductance mechanosensitive channel, MscL. Journal of General Physiology, 113(4), 525-540 (1999).

      (11) N. Levina, S. Tötemeyer, N.R. Stoke, P. Louis, M.A. Jones, I.R. Boot. Protection of Escherichia coli cells against extreme turgor by activation of MscS and MscL mechanosensitive channels: identification of genes required for MscS activity. The EMBO journal (1999).

      (12) V.D. Gordon, L. Wang, Bacterial mechanosensing: the force will be with you, always. Journal of cell science 132(7):jcs227694 (2019).

      (13) M.J. Kühn, L. Talà, Y.F. Inclan, R. Patino, X. Pierrat, I. Vos, Z. Al-Mayyah, H. Macmillan, J. Negrete Jr, J.N. Engel, A. Persat, Mechanotaxis directs Pseudomonas aeruginosa twitching motility. Proceedings of the National Academy of Sciences. 118(30):e2101759118 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews:

      We sincerely thank the reviewers for their thoughtful review and feedback. We believe that our work will provide valuable insights into how MRSA evolves under bacteriophage predation and stimulate efforts to use genetic trade-offs to combat drug resistance. We have substantially revised the paper and performed several additional experiments to address the reviewers' questions and concerns.

      Summary:

      (1) Testing for genetic trade-offs in additional S. aureus strains

      We obtained 30 clinical isolates of the S. aureus USA300 strain that were isolated between 2008 and 2011 (see Table S1). We first tested the FStaph1N, Evo2, and FNM1g6 phages against this expanded strain panel and found that Evo2 showed strong activity against all 30 strains (Table S4). We tested whether Evo2 infection could elicit trade-offs in b-lactam resistance for a subset of these strains. We found that Evo2 infection caused a ~10-100-fold reduction in their MIC against oxacillin. This data is now incorporated into a revised Figure 2 in panel C.

      (2) Testing additional staphylococcal phages

      We isolated from the environment a phage called SATA8505. Similar to FStaph1N and Evo2, SATA8505 belongs to the Kayvirus genus and infects the MRSA strains MRSA252, MW2, and LAC. Phage-resistant MRSA recovered following SATA8505 infection also showed a strong reduction in oxacillin resistance (Figure S5). Furthermore, we confirmed that resistance against FNM1g6, which belongs to the Dubowvirus genes, does not elicit tradeoffs in b-lactam resistance (Figure S4). Sequencing analysis of FNM1g6 - resistant LAC strains showed a different mutation fmhC, which was not observed with the FStaph1N and Evo2 phages (Table 1). We have added this new data into the main text and supplemental figures and tables. Future work will focus on obtaining comprehensive analysis of a wide range of phage families. 

      (3) Testing additional antibiotics

      We also expanded our trade-off analysis include wider range of antibiotic classes (Table S3). Overall, the loss of resistance appears to be confined to b-lactams.

      (4) Genetic analysis of ORF141

      In order determine the function of ORF141, which is mutated in Evo2, we attempted to clone wild-type ORF141 into a staphylococcal plasmid and perform complementation assays with Evo2. Unfortunately, obtaining the plasmid-borne wild-type ORF141 has proven to be tricky, as all clones developed frameshift or deletions in the open reading frame. We posit that the gene product of ORF141 is toxic to the bacteria. We are currently working on placing the gene under more stringent expression conditions but feel that these efforts fall outside of the scope of this paper.  

      (5) Testing the effect of single mutants  

      Our genomic analysis showed that phage-resistant MRSA evolved multiple mutations following phage infection, making it difficult to determine the mechanism of each mutation alone. For example, phage-resistant MW2 and LAC evolved nonsense mutations in transcriptional regulators mgrA, arlR, and sarA. To test whether these mutations alone were sufficient to confer resistance, we obtained MRSA strains with single-gene knockouts of mgrA, arlR, and sarA and tested their ability to resist phage. We observed that deletion of mgrA in the MW2 resulted in a modest reduction in phage sensitivity (Figure S7). However, we did not the observe any changes in the other mutant strains. These results suggest that phage resistance in these strains is likely caused by a combination of mutations. Determining the mechanisms of these mutations is the focus if our future work.

      (6) Transcriptomics of phage-resistant MRSA strains

      To further assess the effects of the phage resistance mutations, we performed bulk RNA-seq on phage-resistant MW2 and LAC strains and compared their differential expression levels to the respective wild-type strains. We picked these strains because our genomic data showed that they had evolved mutations in known transcriptional regulators (e.g. mgrA). Our analysis shows that both strains significantly modulate their gene expression (Figure 4). Notably, both strains upregulate the cell wall-associated protein ebh, while downregulating several genes involved in quorum sensing, virulence, and secretion. We have included this new data in Figure 4 and Table S5 and added an entire section in the manuscript discussing these results and their implications.  

      (7) Co-treatment of MRSA with phage and b-lactam

      We performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance, showing a distinct mutational profile. We discuss these results in detail in the main text.

      Reviewer # 1:

      Strengths:

      Phage-mediated re-sensitization to antibiotics has been reported previously but the underlying mutational analyses have not been described. These studies suggest that phages and antibiotics may target similar pathways in bacteria.

      We thank Reviewer 1 for this assessment. We hope that the data provided in this work will help stimulate further inquiries into this area and help in the development of better phage-based therapies to combat MRSA.

      Weaknesses:

      One limitation is the lack of mechanistic investigations linking particular mutations to the phenotypes reported here. This limits the impact of the work.

      We acknowledge the limitations of our initial analysis. We note (and cite) that separate studies have already linked mutations in femA, mgrA, arlR, and sarA with reduced b-lactam resistance and virulence phenotypes in MRSA, but not to phage resistance. For the other mutations, we could not find literature linking them to our observed phenotypes. We analyzed the effects of single gene knockouts of mgrA, arlR, and sarA on MRSA’s phage resistance. However, as shown above, the results only showed modest effects on phage resistance in the MW2 strain (see Figure S7 and lines 309-317). We therefore believe that mutations in single genes are not sufficient to cause the trade-offs in phage/ b-lactam resistance. Because each MRSA strain evolved multiple mutations (e.g. MW2 evolved 6 or more mutations), we feel that determining the effects of all possible permutations of those mutations was beyond the scope of the paper.

      However, to bridge the mutational data with our phenotypic observations, we performed RNAseq and compared the transcriptomes of un-treated and phage-treated MRSA strains (see Figure 4, Table S5, and lines 337-391). Our results show that phage-treated MRSA strains significantly modulate their transcript levels. Indeed, some of the changes in gene expression can explain for the phenotypic observations (e.g. overexpression of ebh can lead to reduced clumping). Further, the results shown some unexpected patterns, such as the downregulation of quorum sensing genes or genes involved in type VII secretion.

      Another limitation of this work is the use of lab strains and a single pair of phages. However, while incorporation of clinical isolates would increase the translational relevance of this work it is unlikely to change the conclusions.

      We thank the reviewer for this suggestion. We would like to clarify that MW2, MRSA252, and LAC are pathogenic clinical isolates that were isolated between 1997 and 2000’s. However, we acknowledge that, because these 3 strains have been propagated for many generations, they might have acquired laboratory adaptations. We therefore obtained 30 USA300 clinical strains that were isolated in more recent years (~2008-2011) and tested our phages against them. We note that these clinical isolates (generously provided by Dr. Petra Levin’s lab) were preserved with minimal passaging to reduce the effects of laboratory adaptation. We found that the Evo2 phage was able to elicit oxacillin trade-offs in those strains as well. (see Table S1, Table S7, Fig 2C, and lines 210 – 225)

      For the phages, we had to work with phage(s) that could infect all three MRSA strains. That is why in our initial tests, we focused on FStaph1N and Evo2, both members of the Kayvirus genus. Now in our revised work, we extend our analysis to FNM1g6, a member of the Dubowvirus genus, that also infects the LAC strain, but not MW2 and MRSA252. We find that FNM1g6 is unable to drive trade-offs in b-lactam resistance (see lines 229 – 238). Next, we analyzed the effects of SATA8505, also a member of the Kayvirus genus. Here, we observed that SATA8505 can elicit trade-offs in b-lactam resistance (see Figure S5 and lines 238 – 246). These results suggest that not all staphylococcal phages can elicit these trade-offs and call for more comprehensive analyses of different types of phages.

      Reviewer #1 (Recommendations for the authors):

      Specific questions:

      (1) The Evo2 isolate is an evolved version of phage Staph1N with more potent lytic activity. Is this reflected in more pronounced antibiotic sensitivity?

      We did not observe that Evo2-treated MRSA cells showed more sensitivity towards b-lactams. However, we did observe that Evo2 was able to elicit these trade-offs at lower multiplicities of infection (MOI) (see lines 173 – 176 and Figure S2). Further, we did observe that Evo2 caused a greater trade-off in virulence phenotypes (hemolysis and cell agglutination) (see lines 416 - 419 lines 433 – 435, and Figure 5)

      In our revisions, we also tested Evo2-treated MRSA against a wide range of antibiotics. We did not observe significant changes in MICs against those agents.   

      (2) Are there mutations in the SCCmec cassette or the MecA gene after selection against ΦStaph1N?

      We did not observe any mutations in known resistance genes SCCmec or blaZ. Furthermore, we did not see any differential expression of those genes in our transcriptomic data (see lines 344 and 346).  

      (3) The authors report that phage ΦNM1γ6 does not induce antibiotic sensitivity changes despite being effective against bacterial strain LAC. Were mutational sequencing studies performed with the resistant isolates that emerged against this strain? Can the authors hypothesize why these did not impact the virulence or resistance of LAC despite effective killing? How does this align with their models for ΦStaph1N?

      We thank the reviewer for that insightful question. In our revised manuscript, we found that ΦNM1γ6 elicits a point mutation in the fmhC gene, which is involved in cell wall maintenance (see lines 326 – 335). To our knowledge, this point mutation has not been linked to phage resistance or drug sensitivity MRSA. Notably this mutation was not observed with ΦStaph1N or Evo2. We therefore speculate that ΦNM1γ6 binds to a different receptor molecule on the MRSA cell wall.   

      (4) If I understand correctly, the authors attribute these effects of phage predation on antibiotic sensitivity and virulence to orthogonal selection pressures. A good test of this model would be to examine the mutations that emerge in antibiotic/phage co-treatment. This should be done.

      We thank the reviewer for this suggestion. As described in the summary section above, we performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (see lines 440 – 494 and Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance and only limited phage resistance, showing a distinct mutational profile (Figure S6). Under these conditions, we think that the selective pressure exerted by FStaph1N is “overcome” by the selective pressure of the high oxacillin concentration, a point that we discuss in the main text.

      Reviewer #2 (Public review):

      Summary:

      The work presented in the manuscript by Tran et al deals with bacterial evolution in the presence of bacteriophage. Here, the authors have taken three methicillin-resistant S. aureus strains that are also resistant to beta-lactams. Eventually, upon being exposed to phage, these strains develop beta-lactam sensitivity. Besides this, the strains also show other changes in their phenotype such as reduced binding to fibrinogen and hemolysis.

      Strengths:

      The experiments carried out are convincing to suggest such in vitro development of sensitivity to the antibiotics. Authors were also able to "evolve" phage in a similar fashion thus showing enhanced virulence against the bacterium. In the end, authors carry out DNA sequencing of both evolved bacteria and phage and show mutations occurring in various genes. Overall, the experiments that have been carried out are convincing.

      We thank Reviewer 2 for their positive comments.

      Weaknesses:

      Although more experiments are not needed, additional experiments could add more information. For example, the phage gene showing the HTH motif could be reintroduced in the bacterial genome and such a strain can then be assayed with wildtype phage infection to see enhanced virulence as suggested. At least one such experiment proves the discoveries regarding the identification of mutations and their outcome.

      We thank the reviewer for this suggestion. We attempted to clone ORF141 into an expression plasmid and perform complementation experiments with Evo2 phage; however, all transformants that were isolated had premature stop-codons and frameshifts in the wild-type ORF141 insert that would disrupt protein function. We therefore think that the gene product of ORF141 might be toxic to the cells. We are currently working on placing the gene under more stringent transcriptional control but feel that these efforts fall outside of the scope of this paper.  

      Secondly, I also feel that authors looked for beta-lactam sensitivity and they found it. I am sure that if they look for rifampicin resistance in these strains, they will find that too. In this case, I cannot say that the evolution was directed to beta-lactam sensitivity; this is perhaps just one trait that was observed. This is the only weakness I find in the work. Nevertheless, I find the experiments convincing enough; more experiments only add value to the work.  

      We thank the reviewer for their comments. Because both phages and β-lactams interface with the bacterial cell wall, we posited that phage resistance would reduce resistance in cell wall targeting antibiotics. In our revisions, we have expanded our analysis to include a much wider range of antibiotic classes, including rifampicin, mupirocin, erythromycin, and other cell wall disruptors, such as daptomycin and teicoplanin. We did not observe any significant changes to the MICs of these other antibiotics (see Table S3 and lines 191-199). It therefore appears that the effects of these trade-offs are confined to beta-lactams.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors describe a novel pattern of ncRNA processing by Pac1. Pac1 is a RNase III family member in S. pombe that has previously been shown to process pre-snoRNAs. Other RNase III family members, such as Rnt1 in S. cerevisiae and Dosha in human, have similar roles in cleaving precursors to ncRNAs (including miRNA, snRNA, snoRNA, rRNA). All RNAse III family members share that they recognize and cleave dsRNA regions, but differ in their exact sequence and structure requirement. snoRNAs can be processed from their own precursor, a polycistronic pre-cursor, or the intron of a snoRNA host gene. After the intron is spliced out, the snoRNA host gene can either encode an protein or be a non-functional by product.

      In the current manuscript the authors show that in S. pombe snoRNA snR107 and U14 are processed from a common precursor in a way that has not previously been described. snR107 is encoded within an intron and processed from the spliced out intron, similar to a typical intron-encoded snoRNA. What is different is that upon splicing, the host gene can adopt a new secondary structure that requires base-pairing between exon 1 and exon2, generating a Pac1 recognition site. This site is recognized, resulting in cleaving of the RNA and further processing of the 3' cleavage product into U14 snoRNA. In addition, the 5' cleavage product is processed into a ncRNA named mamRNA. The experiments describing this processing are thorough and convincing, and include RNAseq, degradome sequencing, northern blotting, qRT-PCR and the analysis of mutations that disrupt various secondary structures in figures 1, 2, and 3. The authors thereby describe a previously unknown gene design where both the exon and the intron are processed into a snoRNA. They conclude that making the formation of the Pac1 binding site dependent on previous splicing ensures that both snoRNAs are produced in the correct order and amount. Some of the authors findings are further confirmed by a different pre-print (reference 19), but the other preprint did not reveal the involvement of Pac1.

      While the analysis on the mamRNA/snR107/U14 precursor is convincing, as a single example the impact of these findings is uncertain. In Figure 4 and supplemental table 1, the authors use bioinformatic searches and identify other candidate loci in plans and animals that may be processed similarly. Each of these loci encode a putative precursor that results in one snoRNA processed from an intron, a different snoRNA processed from an exon, and a double stranded structure that can only form after splicing. While is potentially interesting, it is also the least developed and could be discussed and developed further as detailed below.

      Major comments:

      1. The proposal that plant and animal pre-snoRNA clusters are processed similarly is speculative. the authors provide no evidence that these precursors are processed by an RNase III enzyme cutting at the proposed splicing-dependent structure. This should not be expected for publication, but would greatly increase the interest.

      All three reviewers expressed a similar concern, and we now provide additional evidence supporting the conservation of the proposed mechanism. Specifically, we focused on the SNHG25 gene in H. sapiens, which hosts two snoRNAs—one intronic, as previously shown in Figure 4B, and one non-intronic. We substantiated our predictions through the re-analysis of multiple sequencing datasets in human cell lines, as outlined below:

      I. Analysis of CAGE-seq and nano-COP datasets indicates a single major transcription initiation site at the SNHG25 locus. Both the intronic and non-intronic snoRNAs are present within the same nascent precursor transcripts (Supplementary Figure 4D).

      II. Degradome-seq experiments in human cell lines reveal that the predicted splicing-dependent stem-loop structure within the SNHG25 gene is subject to endonucleolytic cleavage (Supplementary Figure 4D). The cleavage sites are located at the apical loop and flanking the stem, displaying a staggered symmetry characteristic of RNase III activity (Figure 4C). Importantly, the nucleotide sequence surrounding the 3' cleavage site and the 3' splice-site are conserved in other vertebrates (Supplementary Figure 4.D).

      III. fCLIP experiments demonstrate that DROSHA associates with the spliced SNHG25 transcript (Supplementary Figure 4D).

      Together, these analyses support the generalizability of our model beyond fission yeast. They confirm the structure of the SNHG25 gene as a single non-coding RNA precursor hosting two snoRNAs, one of which is intronic. Importantly, these findings show that the predicted stem-loop structure contains conserved elements and is subject to endonucleolytic cleavage. Human DROSHA, an RNase III enzyme, could be responsible for this processing step.

      The authors provide examples of similarly organized snoRNA clusters from human, mouse and rat, but the examples are not homologous to each other. Does this mean these snoRNA clusters are not conserved, even between mammals? Are the examples identified in Arabidopsis conserved in other plants? If there is no conservation, wouldn't that indicate that this snoRNA cluster organization offers no benefit?

      We noticed during this revision that the human SNHG25 locus is actually very well conserved in mice at the GM36220 locus, where both snoRNAs (SNORD104 and SNORA50C/GM221711) are similarly arranged. Although the murine host gene, GM36220, also contains an intron in the UCSC annotation, it is intronless in the Ensembl annotation we used to screen for mixed snoRNA clusters, which explains why it was not part of our initial list of candidates (Supplementary Table 1). Importantly, sequence elements in SNHG25, close to the splice sites and cleavage sites in exon 2, are also well conserved in mice and other vertebrates (Supplementary Figure 4D). Therefore, it is reasonable to think that the mechanism described for SNHG25 in humans may also apply in mice and other vertebrates.

      That being said, snoRNAs are highly mobile genetic elements. For example, it is well established that even between relatively closely related species (e.g., mouse and human), the positions of intronic snoRNAs within their host genes are not strictly conserved, even when both the snoRNAs and their host genes are. In the constrained drift model of snoRNA evolution (Hoeppner et al., BMC Evolutionary Biology, 2012; doi: 10.1186/1471-2148-12-183), it is proposed that snoRNAs are mobile and “may occupy any genomic location from which expression satisfies phenotype.”

      Therefore, a low level of conservation in mixed snoRNA clusters is generally expected and does not necessarily imply that is offers no benefit. Despite the limited conservation of snoRNA identity across species, mixed snoRNA clusters consistently display two recurring features: (1) non-intronic snoRNAs often follow intronic snoRNAs, and (2) the predicted secondary structure tends to span the last exon–exon junction. These enriched features support the idea that enforcing sequential processing of mixed snoRNA clusters may confer a selective advantage. We now explicitly discuss these points in the revised manuscript.

      Supplemental Figure 4 shows some evidence that the S. pombe gene organization is conserved within the Schizosaccharomyces genus, but could be enhanced further by showing what sequences/features are conserved. Presumably the U14 sequence is conserved, but snR107 is not indicated. Is it not conserved? Is the stem-loop more conserved than neighboring sequences? Are there any compensatory mutations that change the sequence but maintain the structure? Is there evidence for conservation outside the Schizosaccharomyces genus?

      We thank the reviewer for these excellent suggestions, which helped us significantly improve Supplementary Figure 4. In the revised version, we now include an additional species—S. japonicus, which is more evolutionarily distant—and show that the intronic snR107 is conserved across the Schizosaccharomyces genus (Supplementary Figure 4A). The distance between conserved elements (splice sites, snoRNAs, and RNA structures) varies, indicating that surrounding sequences are less conserved compared to these functionally constrained features

      We also performed a detailed alignment of the sequences corresponding to the predicted RNA secondary structures. This revealed that the apical regions are less conserved than the base, particularly near the splice and cleavage sites. In these regions, we observe compensatory or base-pair-neutral mutations (e.g., U-to-C or C-to-U, which both pair with G), suggesting structural conservation through evolutionary constraint (Supplementary Figures 4B–C). These observations are now described in greater detail in the revised manuscript, along with a discussion of the specific features likely to be under selective pressure at this locus.

      Conservation outside the Schizosaccharomyces genus is less clear. As already noted in the manuscript, the S. cerevisiae locus retains synteny between snR107 and snoU14, but the polycistronic precursor encompassing both is intronless and processed by RNase III (Rnt1) between the cistrons. Similarly, in Ashbya gossypii and a few other fungal species, synteny is preserved, but no intron appears to be present in the presumed common precursor. Notably, secondary structure predictions for the A. gossypii locus (not shown) suggest the formation of a stable stem-loop encompassing the first snoRNA in a large apical loop. This could reflect a distinct mode of snoRNA maturation, possibly analogous to pri-miRNA processing, where cleavage by an RNase III enzyme contributes to both 5′ and 3′ end formation. In Candida albicans, snoU14 is annotated within an intron of a host gene, but no homolog of snR107 is annotated. Other cases either resemble one of the above scenarios or are inconclusive due to the lack of a clearly conserved snoRNA (or possibly due to incomplete annotation). Although these examples are potentially interesting, we have chosen not to elaborate on them in the manuscript in order to maintain focus and avoid speculative interpretation in the absence of stronger evidence.

      The authors suggest that snoRNAs can be processed from the exons of protein coding genes, but snoRNA processing would destroy the mRNA. Thus snoRNAs processing and mRNA function seem to be alternative outcomes that are mutually exclusive. Can the authors comment?

      In theory, we agree with reviewer on the mutually exclusive nature of mRNA and snoRNA expression for putative snoRNA hosted in the exon of protein coding genes. However, we want to clarify that the specific examples of snoRNA precursor (or host) developed in the manuscript (mamRNA-snoU14 in S.pombe and, in this resubmission, SNHG25 in H. sapiens) are non-coding. So although we do not exclude that our model of sequential processing through splicing and endonucleolytic cleavage could apply to coding snoRNA precursors, it is not something we want to insist on, especially given the lack of experimental evidence for these cases.

      It is possible that the use of the term "exonic snoRNA" in the first version of the manuscript lead to the reviewer's impression that we explicitly meant that snoRNA processing can be processed from the exon of protein coding genes, which was not what we meant (although we do not exclude it). If that was the case, we apologize for the confusion. We have now clarified the issue (see next point).

      Minor comments:

      The term "exonic snoRNA" is confusing. Isn't any snoRNA by definition an exon?

      We agree that this term can be confusing, a sentiment that was also shared by reviewer 3. We replaced the problematic term by either "non-intronic snoRNA", "snoRNA" or "snoRNA gene located in exon" depending on the context, which are more unambiguous in conveying our intended meaning.

      The methods section does not include how similar snoRNA clusters were identified in other species

      We have now corrected this omission in the method section ('Identification of mixed snoRNA clusters' subsection): "To identify mixed snoRNA clusters, we downloaded the latest genome annotation from Ensembl and selected snoRNAs co-hosted within the same precursor, with at least one being intronic and at least one being non-intronic. We filtered out ambiguous cases where snoRNAs overlapped exons defined as 'retained introns', reasoning that in these cases the snoRNA is more likely to be intronic than not."

      In the discussion the authors argue that a previously published observation that S. pombe U14 does not complement a S. cerevisiae mutation can be explained because "was promoter elements... were simply not included in the transgene sequence". However, even if promoter elements were included, the dsRNA structure of S. pombe would not be cleaved by the S. cerevisiae RNase III. I doubt that missing promoter elements are the full explanation, and the authors provide insufficient data to support this conclusion.

      We agree with the reviewer that, given the substantial divergence in substrate specificity between Pac1 and Rnt1, it is unlikely that S. pombe snoU14 would be efficiently processed from its precursor in S. cerevisiae. We did not intend to suggest otherwise, and we have now removed this part of the discussion. As the experiment reported by Samarsky et al. did not detect expression of the S. pombe snoU14 precursor (even its unprocessed form), it remains inconclusive with respect to the conservation (or lack thereof) of snoU14 processing mechanisms.

      For the record, we had originally included this discussion to point out that the lack of cryptic promoter activity (or at least none that S. cerevisiae can use) within the S. pombe snoU14 precursor supports the idea that transcription initiates solely upstream of the mamRNA precursor. However, we recognize that this argument is speculative and potentially confusing. We have therefore removed it from the revised manuscript to maintain clarity and focus.

      **Referees cross-commenting**

      I agree with the other 2 reviewers but think the thiouracil pulse labeling reviewer 2 suggests would take considerable work and if snoRNA processing is very fast might not be as conclusive as the reviewer suggests.

      We are grateful to the reviewer for this comment, which helped us perform this reviewing in a timely manner.

      Reviewer #1 (Significance (Required)):

      In the current manuscript the authors show that in S. pombe snoRNA snR107 and U14 are processed from a common precursor in a way that has not previously been described. The experiments describing this processing are thorough and convincing, and include RNAseq, degradome sequencing, northern blotting, qRT-PCR and the analysis of mutations that disrupt various secondary structures in figures 1, 2, and 3. The authors thereby describe a previously unknown gene design where both the exon and the intron are processed into a snoRNA.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      __ __The manuscript presents a novel mode of processing for polycistronic snoRNAs in the yeast Saccharomyces pombe. The authors demonstrate that the processing sequence of a transcription unit containing U14, intronic snR107, and an overlapping non-coding mamRNA is determined by secondary structures recognized by RNase III (Pac1). Specifically, the formation of a stem structure over the mamRNA exon-exon junction facilitates the processing of terminal exonic-encoded U14. Consequently, U14 maturation occurs only after the mamRNA intron (containing snR107) is spliced out. This mechanism prevents the accumulation of unspliced, truncated mamRNA.

      1.The first section describing the processing steps is challenging to follow due to the unusual organization of the locus and maturation pathway. If the manuscript is intended for a broad audience, I recommend simplifying this section and presenting it in a more accessible manner. A larger diagram illustrating the transcription unit and processing intermediates would be beneficial. Additionally, introducing snR107 earlier in the text would improve clarity.

      We thank the reviewer for these excellent suggestions. In the previous version of the manuscript, we were cautious in how we introduced the locus, as snR107 and the associated intron had not yet been published. This is no longer the case, as the locus is now described in Leroy et al. (2025). Accordingly, we now introduce the complete locus at the beginning of the manuscript and have improved the corresponding diagram (new Figure 1A). We believe these changes enhance clarity and make the section more accessible to a broader audience.

      2.Evaluation of some results is difficult due to the overexposure of Northern blot signals in Figures 1 and 2. The unspliced and spliced precursors appear as a single band, making it hard to distinguish processing intermediates. Would the authors consider presenting these results similarly to Figure 3, where bands are more clearly resolved? Or presenting both overexposed and underexposed blots?

      For all blots (probes A, B, and C), we selected an exposure level that allows detection of precursor forms under wild-type (WT) conditions. This necessarily results in some overexposure of the accumulating precursors in mutant conditions, due to their broad dynamic range of accumulation. To address this, we now provide an additional supplementary "source data" file containing all uncropped blots with both low and high exposures.

      For example, a lower exposure version of the blot in new Figure 1.B (included in the source data file) confirms the consistent accumulation of the spliced precursor when Pac1 activity is compromised. The unspliced precursor also shows slight accumulation in the Pac1-ts mutant, although to a much lesser extent than the spliced precursor. This observation is consistent with our qPCR results (new Figure 1.C).

      Importantly, because this effect is not observed in neither the Pac1-AA or the steam-dead (SD) mutants, we interpret it as an indirect effect—possibly reflecting a mild growth defect in the Pac1-ts strain, even under growth-permissive conditions. We now explicitly address this point in the revised manuscript.

      3.Additionally, I noticed a discrepancy in U14 detection: Probe B gives a strong signal for U14 in Figure 3B, whereas in Figures 1 and 2, U14 appears as faint bands. Could the authors clarify this inconsistency?

      We thank the reviewer for pointing out this discrepancy. The variation in U14 signal intensity is most likely due to technical differences in UV crosslinking efficiency during the Northern blot procedure. This step can differentially affect the membrane retention of RNA species depending on their length, as previously reported (PMID: 17405769). Because U14 is a relatively abundant snoRNA, the fainter signal observed in Figure 1 (relative to the accumulating precursor) likely reflects suboptimal crosslinking of shorter RNAs in that particular blot.

      Importantly, this technical variability does not impact the conclusions of our study, as we do not compare RNA species of different lengths directly. To increase transparency, we now provide a supplementary "source data" file that includes all uncropped blots from our Northern blot experiments. These include examples—such as the uncropped blot for Figure 1B—where U14 retention is more consistent.

      4.Furthermore, ethidium bromide (EtBr) staining of rRNA is used as a loading control, but overexposed signals from the gel may not accurately reflect RNA amounts on the membrane. This could affect the interpretation of mature RNA species' relative abundance.

      We thank the reviewer for pointing this out and have now measured rRNAs loading on the same northern blot membrane from probes complementary to mature rRNA. We updated new Figures 1B, 2B, 3B, S1B, and S3A accordingly.

      5.To further support the sequential processing model, the authors could use pulse-labeling thiouracil to test the accumulation of newly transcribed RNAs and accumulation of individual species. Additionally, it could help determine whether U14 can be processed through alternative, less efficient pathways. Would the authors consider incorporating this approach?

      We thank the reviewer for this pertinent suggestion. We actually plan to investigate the putative alternative U14 maturation pathway in future work, and the suggested approach will definitely be instrumental for that. However, to keep the present manuscript focused, and also to keep the review timely (successful pulse-chase experiments are likely to take time to optimize – as also suggested by the other reviewers in their cross-commenting section), we prefer not to perform this experiment for this reviewing.

      7.In the final section, the authors propose that this processing mechanism is conserved across species, identifying 12 similar genetic loci in different organisms. This is very interesting finding. In my opinion, providing any experimental evidence would greatly strengthen this claim and the manuscript's significance. Even preliminary validation would add substantial value!

      We thank the reviewer for his/her enthusiasm and are glad to provide some preliminary validation to the final section of our manuscript. Specifically, we focused on the SNHG25 gene in H. sapiens, which hosts two snoRNAs—one intronic, as previously shown in Figure 4B, and one non-intronic. We substantiated our predictions through the re-analysis of multiple sequencing datasets in human cell lines, as outlined below:

      I.Analysis of CAGE-seq and nano-COP datasets indicates a single major transcription initiation site at the SNHG25 locus. Both the intronic and non-intronic snoRNAs are present within the same nascent precursor transcripts (Supplementary Figure 4D).

      II.Degradome-seq experiments in human cell lines reveal that the predicted splicing-dependent stem-loop structure within the SNHG25 gene is subject to endonucleolytic cleavage (Supplementary Figure 4D). The cleavage sites are located at the apical loop and flanking the stem, displaying a staggered symmetry characteristic of RNase III activity (Figure 4C). Importantly, the nucleotide sequence surrounding the 3' cleavage site and the 3' splice-site are conserved in other vertebrates (Supplementary Figure 4.D).

      III. fCLIP experiments demonstrate that DROSHA associates with the spliced SNHG25 transcript (Supplementary Figure 4D).

      Together, these analyses support the generalizability of our model beyond fission yeast. They confirm the structure of the SNHG25 gene as a single non-coding RNA precursor hosting two snoRNAs, one of which is intronic. Importantly, these findings unambiguously show that the predicted stem-loop structure is subject to endonucleolytic cleavage, and they are consistent with DROSHA, an RNase III enzyme, being responsible for this processing step.

      **Referees cross-commenting**

      The other two reviewers' comments are justified.

      Reviewer #2 (Significance (Required)):

      The authors describe an interesting novel mode of snoRNA procseeimg form the host transcript. The results appear sound and intriguing, especially if the proposed mechanism can be confirmed across different organisms. Including such validation would significantly enhance the impact and make this work of broad audience interest.

      My expertise: transcription, non-coding RNAs

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The manuscript by Migeot et al., focuses on a new Pac1-mediated snoRNA processing pathway for intron-encoded snoRNA pairs in yeast Schizosaccharomyces pombe. The novelty of the findings described in MS is the report of an unusual and relatively rare genomic organization and sequential processing of a few snoRNA genes in S. pombe and other eukaryotic organisms. It appears that in the case of snoRNA pairs, hosted in pre-mRNA in the intron and exon, respectively, the release of separate pre-snoRNAs from the host gene relies first on splicing to free the intron-encoded snoRNA, followed by endonucleolytic cleavage by RNase III (Pac1 in S. pombe) to produce snoRNA present in the mRNA exon. The sequential processing pathway, ensuring proper maturation of two snoRNAs, was demonstrated and argued in an elegant and clear way. The main message of the MS is straightforward, most experiments are properly conducted and specific conclusions based on the data are justified and valid. The text is clearly written and well-presentded.

      But there are some shortcomings.

      1.First of all, the title of the MS and general conclusions regarding the Pac1-mediated sequential release of snoRNA pairs hosted within the intron are definitely an overstatement. Especially the title suggests that this genomic organization and unusual processing mode of these snoRNAs is widespread. Later in the discussion the authors themselves admit that such mixed exonic-intronic snoRNAs are rare, although their presence may be underestimated due to annotation problems. It is likely that such snoRNA arrangement and processing is conserved, but the evidence is missing and only unique cases were identified based on bioinformatics mining and their processing has not been assayed. This makes the generalization impossible based on a single documented mamRNA/snoU14 example, no matter how carefully examined.

      We thank the reviewer for clearly articulating this concern. In response, we now provide additional evidence supporting conservation of the proposed mechanism in other species:

      • Conservation within the Schizosaccharomyces genus (Figures S4A–C) has been further analyzed, as suggested by Reviewer 1. This expanded analysis highlights conserved features—such as splice sites and cleavage sites within the predicted stem-loop structure—indicating that these elements are under selective constraint.

      • Conservation in mammals is now supported by experimental data, as detailed in our responses to point #7 of Reviewer 2 and major comment #1 of Reviewer 1. Specifically, we show that for the SNHG25 gene in H. sapiens (Figure S4D):

      (1) nascent transcription give rise to a single non-coding RNA precursor that hosts two snoRNAs, one of which is intronic;

      (2) the predicted stem-loop structure contains conserved elements and is subject to endonucleolytic cleavage;

      (3) the RNase III enzyme DROSHA associates with the spliced SNHG25 precursor.

      Together, these analyses strengthen the evidence for the evolutionary conservation of the mechanism and support the general conclusions and title of the manuscript.

      Another interesting observation is that, similarly to other intron-encoded snoRNA in other species, there is a redundant pathway to produce mature U14 in addition to Pac1-mediated cleavage. In the case of intronic snoRNAs in S. cerevisiae, their release could be performed either by splicing/debranching or Rnt1 cleavage, but there is also a third alternative option, that is processing following transcription termination downstream of the snoRNA gene, which at the same time interferes with the expression of the host gene. Is such a scenario possible as an alternative pathway for U14? Are there any putative, or even cryptic, terminators downstream of the U14 gene? The authors did not consider or attempt to inspect this possibility.

      We thank the reviewer for this interesting and thoughtful comment. First, we would like to clarify that snoU14 is not intron-encoded; rather, it is located on the exon downstream of the intron-encoded snR107.

      Regarding the possibility of transcription termination-based processing: downstream of snoU14, we identified a non-consensus polyadenylation signal (AUUAAA) preceded by a U-rich tract, followed by three consensus polyadenylation signals (AAUAAA) within a 500-nt window. These elements likely contribute to robust and redundant transcription termination at this highly expressed locus. However, since all these sites are located downstream of snoU14, they do not provide an alternative 5′-end processing mechanism for this snoRNA –they reflect normal termination.

      If we correctly understood the reviewer’s suggestion (apologies if not), they may have been referring to the possibility of a cryptic or alternative polyadenylation site between snR107 and snoU14 instead. If cleavage were to occur in this inter-snoRNA region while transcription continued past snoU14, it could, in principle, allow for alternative processing of snoU14. We have indeed considered this scenario. However, we currently do not find strong support for it: there are no identifiable polyadenylation signals motifs between the two snoRNAs, aside from a weakly conserved and questionable AAUAAU hexamer that does not appear to be used as polyA site at least in WT conditions (DOI: 10.4161/rna.25758). Given the lack of evidence, we chose not to explore this hypothesis further in the present manuscript, though it remains an interesting possibility for future investigation.

      I also have some concerns or comments related to the presented research, which are no major, but are mainly related to data quatification, but have to be addressed.

      • In Pac1-ts and Pac1-AA strains the level of mature U14 seems upregulated compared to respective WT (Figure 1A). At the same time mature 25S and 18S rRNAs are less abundant. But there is no quantification and it is not mentioned in the text. What could be the reason for these effects?

      We thank the reviewer for this observation. As reviewer 2 also noted, ethidium bromide staining of mature rRNAs is not a reliable quantitative loading control. In response to this concern, we have now reprobed all northern blots with radiolabeled rRNA probes. These provide a more accurate and consistent loading for our blots (new Figures 1B, 2B, 3B, S1B, S3A).

      Using these improved loading controls, it is evident that snoU14, snR107, and the unspliced precursor are all slightly upregulated in the Pac1-ts strain, although to a much lesser extent than the spliced precursor, which accumulates dramatically. We do not observe this effect in either the Pac1-AA or stem-dead (SD) mutants. We therefore interpret the modest upregulation as an indirect effect, possibly linked to the physiological state of the Pac1-ts mutant, which exhibits slower growth even at growth-permissive temperatures. We now explicitly discuss this in the revised manuscript.

      Regarding the suggestion to include quantification of the northern blot signal: we opted not to include this in the figures for the following reasons. First, the accumulation of the spliced precursor—the central focus of our analysis—is large and highly reproducible across all replicates and conditions. Second, northern blot quantification by pixel intensity remains semi-quantitative, particularly for comparisons across RNAs of highly different abundance. Finally, we support our conclusions with additional quantitative data from RT-qPCR and RNA-seq, which provide more robust measures of RNA accumulation.

      • Processing of the other snoRNA from the mamRNA/snoU14 precursor is largely overlooked in the MS. It is commented on only in the context of mutants expressing constitutive mamRNA-CS constructs (Figure 3B). Its level was checked in Pac1-ts and Pac1-AA (Supplementary Figure 1), but the authors conclude that "its expression remained largely unaffected by Pac1 inactivation", which is clearly not true. Similarly to U14, also snR170 is increased in Pac1-ts and Pac1-AA strains, at least judged "by eye" because the loading control or quantification is not provided. This matter should be clarified.

      We thank the reviewer for pointing this out. We have now included appropriate loading controls for Supplementary Figure 1 to clarify the interpretation. As discussed in our response to the previous comment, we observe a general upregulation of the mamRNA locus in the Pac1-ts strain, which likely contributes to the increased levels of both snR107 and snoU14. However, because this upregulation is not observed in the Pac1-AA or stem-dead (SD) mutants, we interpret it as an indirect effect, possibly related to the altered physiological state of the Pac1-ts strain (e.g., slightly reduced growth rate even at the permissive temperature). This interpretation has now been clearly explained in the revised manuscript.

      We also identified and corrected a labeling error in the previous version of Supplementary Figure 1, where the Pac1-ts and Pac1-AA strains were inadvertently swapped. We sincerely apologize for the confusion this may have caused and have now ensured that all figure panels are correctly labeled and consistent with the text.

      Other minor comments:

      Minor points:

      1. Page 1, Abstract. The sentence "The hairpin recruits the RNase III Pac1 that cleaves and destabilizes the precursor transcript while participating in the maturation of the downstream exonic snoRNA, but only after splicing and release of the intronic snoRNA" is not entirely clear and should be simplified, maybe split into two sentences. This message is clear after reading the MS and learning the data, but not in the abstract.

      We thank the reviewer for pointing this out and have now clarified the abstract following the suggestion to split and simplify the problematic sentence : "... the sequence surrounding an exon-exon junction within their precursor transcript folds into a hairpin after splicing of the intron. This hairpin recruits the RNase III ortholog Pac1, which participates in the maturation of the downstream snoRNA by cleaving the precursor."

      Page 1, Introduction. I am not convinced by the need to use the term "exonic snoRNA" for all snoRNA that are not intronic, which is misleading, and is rather associated per se with snoRNA encoded in the mRNA exon. It has been used before in the review about snoRNAs by Michelle Scott published in RNA Biol (2024), but it does not justify its common use.

      We thank the reviewer for raising this important point. We agree that the term “exonic snoRNA” can be misleading, as it was previously used to specifically refer to snoRNAs embedded within exons of mRNA transcripts—an rare and potentially artifactual scenario, as very cautiously discussed by Michelle Scott and colleagues in their review published in RNA Biol (2024).

      In the previous version of our manuscript, we actually used “exonic snoRNA” in a broader sense to denote any snoRNA not encoded within an intron, primarily for convenience in contrasting the processing of intronic snR107 with that of non-intronic/exonic snoU14. However, we recognize that this usage is non-standard and risks confusion due to the ambiguity surrounding the term’s definition in the literature.

      In light of this, and in agreement with reviewer 1 who raised a similar concern, we have revised the manuscript to remove the term “exonic snoRNA” entirely. Depending on the context, we now refer more precisely to “non-intronic snoRNA,” “snoRNA gene located in exon,” or simply “snoRNA.”

      Supplementary Figure 3. It is difficult to assess whether the level of mature rRNAs is unchanged in the mutants based on EtBr staining and without calculations. Northern blotting should be performed and the levels properly calculated.

      As suggested, we performed northern blotting on mature 18S and 25S, quantified the signal and observed no significant differences (new Supplementary Figure 3).

      **Referees cross-commenting**

      I also agree that 4sU labeling may require too much work with a questionable result.

      We are grateful to the reviewer for this comment, which helped us perform this reviewing in a timely manner.

      Reviewer #3 (Significance (Required)):

      Strengths: 1. Novelty of the described genomic arrangement of snoRNA/ncRNA genes and their processing in a sequential and regulated manner.

      Potential conservation of this pathways across eukaryotic organisms. Well designed and performed experiments followed by proper conclusions.

      Limitations: 1. Insufficient evidence to support generalization of the study results.

      Moderate overall impact of the study

      Advance: This research can be placed within publications describing specific processing pathways for various non-coding RNAs, including for example unusual chimeric species such as sno-lncRNAs. In this context, the presented results do advance the knowledge in the field by providing mechanistic evidence for a tightly controlled and coordinated maturation of selected ncRNAs.

      Audience: Basic research and specialized. The interest in this research will rather be limited to a specific field.

    1. Author response:

      The following is the authors’ response to the previous reviews

      General Response to Reviewers:

      We thank the Reviewers for their comments, which continue to substantially improve the quality and clarity of the manuscript, and therefore help us to strengthen its message while acknowledging alternative explanations.

      All three reviewers raised the concern that we have not proven that Rab3A is acting on a presynaptic mechanism to increase mEPSC amplitude after TTX treatment of mouse cortical cultures.  The reviewers’ main point is that we have not shown a lack of upregulation of postsynaptic receptors in mouse cortical cultures. We want to stress that we agree that postsynaptic receptors are upregulated after activity block in neuronal cultures.  However, the reviewers are not acknowledging that we have previously presented strong evidence at the mammalian NMJ that there is no increase in AChR after activity blockade, and therefore the requirement for Rab3A in the homeostatic increase in quantal amplitude points to a presynaptic contribution. We agree that we should restrict our firmest conclusions to the data in the current study, but in the Discussion we are proposing interpretations. We have added the following new text:

      “The impetus for our current study was two previous studies in which we examined homeostatic regulation of quantal amplitude at the NMJ.  An advantage of studying the NMJ is that synaptic ACh receptors are easily identified with fluorescently labeled alpha-bungarotoxin, which allows for very accurate quantification of postsynaptic receptor density. We were able to detect a known change due to mixing 2 colors of alpha-BTX to within 1% (Wang et al., 2005).  Using this model synapse, we showed that there was no increase in synaptic AChRs after TTX treatment, whereas miniature endplate current increased 35% (Wang et al., 2005). We further showed that the presynaptic protein Rab3A was necessary for full upregulation of mEPC amplitude (Wang et al., 2011). These data strongly suggested Rab3A contributed to homeostatic upregulation of quantal amplitude via a presynaptic mechanism.  With the current study showing that Rab3A is required for the homeostatic increase in mEPSC amplitude in cortical cultures, one interpretation is that in both situations, Rab3A is required for an increase in the presynaptic quantum.”

      The point we are making is that the current manuscript is an extension of that work and interpretation of our findings regarding the variability of upregulation of postsynaptic receptors in our mouse cortical cultures further supports the idea that there is a Rab3Adependent presynaptic contribution to homeostatic increases in quantal amplitude.

      Public Reviews:

      Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.

      We have not removed the statement, but altered it to soften the conclusion. It now reads, “We found that the increase in postsynaptic AMPAR levels in wild type cultures was more variable than that of mEPSC amplitudes, which might be explained by a presynaptic contribution, but we cannot rule out variability in the measurement.”.

      Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

      This particular statement is in the previous response to reviewers only, we deleted the sentence that starts, “The simplest explanation Rab3A regulates a presynaptic contributor….”. and “Imaging of immunofluorescence more variable…”. We deleted “ our data suggest….consistently leads to an increase in mEPSC amplitude and sometimes leads to….” We added “…the lack of a robust increase in receptor levels leaves open the possibility that there is a presynaptic contributor to quantal size in mouse cortical cultures. However, the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is wellsupported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      The main body of this comment is addressed in the General Response to Reviewers. In addition, we deleted text “current data, coupled with our previous findings at the mouse neuromuscular junction, support the idea that there are additional sources contributing to the homeostatic increase in quantal size.” We added new text, so the sentence now reads: “Increased receptors likely contribute to increases in mESPC amplitudes in mouse cortical cultures, but because we do not have a significant increase in GluA2 receptors in our experiments, it is impossible to conclude that the increase is lacking in cultures from Rab3A<sup>-/-</sup> neurons.”

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture?

      Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. 

      A major goal of performing the immunofluorescence measurements in the same cultures for which we had electrophysiological results was to address the common impression that the homeostatic effect itself is highly variable, as the reviewer notes in the comment “…GluA2 increases are variable across cultures…” Presumably, if GluA2 increases are the mechanism of the mEPSC amplitude increases, then variable GluA2 increases should correlate with variable mEPSC amplitude increases, but that is not what we observed. We are left with the explanation that the immunofluorescence method itself is very variable. We have added the point to the Discussion, which reads, “the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent homeostatic plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Finally, the implication of “Shouldn’t this be an indicator that something unusual has happened in this culture?” if it is not due to culture to culture variability in the homeostatic response itself, is that there was a technical problem with accurately measuring receptor levels. We have no reason to suspect anything was amiss in this set of coverslips (the values for controls and for TTX-treated were not outside the range of values in other experiments). In any of the coverslips, there may be variability in the amount of primary anti-GluA2 antibody, as this was added directly to the culture rather than prepared as a diluted solution and added to all the coverslips. But to remove this one experiment because it did not give the expected result is to allow bias to direct our data selection.

      The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size.”

      We have the same situation; our effect sizes match (19.7% increase for mEPSC amplitude; 18.1% increase for GluA2 receptor cluster size, see Table 1), but in our case, the p value for receptors does not reach statistical significance. Our point here is that there is published evidence that the variability in receptor measurements is greater than the variability in electrophysiological measurements. But we have softened this point, removing the sentences containing “…consistently leads and sometimes...” and “……additional sources contributing…”.

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      We have removed all uses of the word “mismatch,” but feel the presentation of the 3 matched experiments, 23-24 cells (Figure 5A, D), and the additional 2 experiments for a total of 5 cultures, 48-49 cells (Figure 5C, F), is important in order to demonstrate that the lack of statistically significant receptor response is due neither to a variable homeostatic response in the mEPSC amplitudes, nor to a small number of cultures.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffic to synapses if possible at all).

      We were unable to identify a key reference that characterized GluA2 homomers vs. heteromers in native cortical neurons, but we have rewritten the section in the manuscript to acknowledge the low conductance of homomers:

      “…to assess whether GluA2 receptor expression, which will identify GluA2 homomers and GluA2 heteromers (the former unlikely to contribute to mEPSCs given their low conductance relative to heteromers (Swanson et al., 1997; Mansour et al., 2001)…”

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      Figure 3 in Turrigiano et al., shows that the increase in glutamate responsiveness is higher at the cell body than along the primary dendrite. We have revised our description to indicate that an increase in responsiveness on the primary dendrite has been demonstrated in Turrigiano et al. 1998.

      “We focused on the primary dendrite of pyramidal neurons as a way to reduce variability that might arise from being at widely ranging distances from the cell body, or, from inadvertently sampling dendritic regions arising from inhibitory neurons. In addition, it has been shown that there is a clear increase in response to glutamate in this region (Turrigiano et al., 1998).”

      “…synaptic receptors on the primary dendrite, where a clear increase in sensitivity to exogenously applied glutamate was demonstrated (see Figure 3 in (Turrigiano et al., 1998)).

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

      We have modified the paragraph in the Discussion that addresses the glial results, to describe more clearly the data that supported an astrocytic TNF-alpha mechanism: “TNF-alpha accumulates after activity blockade, and directly applied to neuronal cultures, can cause an increase in GluA1 receptors, providing a potential mechanism by which activity blockade leads to the homeostatic upregulation of postsynaptic receptors (Beattie et al., 2002; Stellwagen et al., 2005; Stellwagen and Malenka, 2006).”

      We have also acknowledged that we cannot rule out TNF-alpha coming from neurons in the cortical cultures: “…suggesting the possibility that neuronal Rab3A can act via a non-TNF-alpha mechanism to contribute to homeostatic regulation of quantal amplitude, although we have not ruled out a neuronal Rab3A-mediated TNF-alpha pathway in cortical cultures.”

      Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Regarding the concern that the lack of increase in receptors is due to a technical issue, please see General Response to Reviewers, above. We have also softened our conclusions throughout, acknowledging we cannot rule out a technical issue.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      The key finding in Figure 3 is that NASPM did not eliminate the statistically significant increase in mEPSC amplitude after TTX treatment (Fig 3A).  Whether or not NASPM sensitive receptors contribute to mESPC amplitude is a separate question (Fig 3B). We are open to the possibility that NASPM reduces mEPSC amplitude in both control and TTX treated cells (p = 0.08 for both), but that does not change our conclusion that NASPM has no effect on the TTX-induced increase in mEPSC amplitude. The mechanism underlying the decrease in mEPSC frequency following NASPM is interesting, but does not alter our conclusions regarding the role of Rab3A in homeostatic synaptic plasticity of mEPSC amplitude. In addition, the Reviewer does not acknowledge the Supplemental Figure #1, which shows a similar lack of correspondence between homeostatic increases in mEPSC amplitude and GluA1 receptors in two cultures where matched data were obtained. Therefore, we do not think our lack of a robust increase in receptors can be explained by our failing to look at the relevant receptor.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      We agree that definitive evidence for a presynaptic role of Rab3A in homeostatic plasticity of mEPSC amplitudes in mouse cortical cultures requires demonstrating that loss of Rab3A in postsynaptic neurons does not disrupt the plasticity, whereas loss in presynaptic neurons does. Without these data, we can only speculate that the Rab3A-dependence of homeostatic plasticity of quantal size in cortical neurons may be similar to that of the neuromuscular junction, where it cannot be receptors. We have added to the Discussion that the mechanism of Rab3A regulation of homeostatic plasticity of quantal amplitude could different between cortical neurons and the neuromuscular junction (lines 448-450 in markup,). Establishing a way to co-culture Rab3A-/- and Rab3A+/+ neurons in ratios that would allow us to record from a Rab3A-/- neuron that has mainly Rab3A+/+ inputs (or vice versa) is not impossible, but requires either transfection or transgenic expression with markers that identify the relevant genotype, and will be the subject of future experiments.

      (2): Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

      We agree with the Reviewer, that it is important to determine the generality of Rab3A function in homeostatic plasticity. Establishing the homeostatic effect on mIPSCs and then examining them in Rab3A-/- cultures is a large undertaking and will be the subject of future experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor (remaining) points:

      (1) The figure referenced in the first response to the reviewers (Figure 5G) does not exist.

      We meant Figure 5F, which has been corrected in the current response.

      (2) I recommend showing the data without binning (despite some overlap).

      The box plot in Origin will not allow not binning, but we can make the bin size so small that for all intents and purposes, there is close to 1 sample in each bin. When we do this, the majority of data are overlapped in a straight vertical line. Previously described concerns were regarding the gaps in the data, but it should be noted that these are cell means and we are not depicting the distributions of mEPSC amplitudes within a recording or across multiple recordings.

      (3) Please auto-scale all axes from 0 (e.g., Fig 1E, F).

      We have rescaled all mEPSC amplitude axes in box plots to go from 0 (Figures 1, 2 and 6).

      (4) Typo in Figure legend 3: "NASPM (20 um)" => uM

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 140, frequencies are reported in Hz while other places are in sec-1, while these are essentially the same, they should be kept consistent in writing.

      All mEPSC frequencies have been changed to sec<sup>-1</sup>, except we have left “Hz” for repetitive stimulation and filtering.

      (2) Paragraph starting from line 163 (as well as other places where multiple groups are compared, such as the occlusion discussion), the authors assessed whether there was a change in baseline between WT and mutant group by doing pairwise tests, this is not the right test. A two-way ANOVA, or at least a multivariant test would be more appropriate.

      We have performed a two-way ANOVA, with genotype as one factor, and treatment as the other factor. The p values in Figures 1 and 2 have been revised to reflect p values from the post-hoc Tukey test on the specific interactions (for each particular genotype, TTX vs CON effects). The difference in the two WT strains, untreated, was not significant in the Post-Hoc Tukey test, and we have revised the text. The difference between the untreated WT from the Rab3A+/Ebd colony and the untreated Rab3AEbd/Ebd mutant was still significant in the Post-Hoc Tukey test, and this has replaced the Kruskal-Wallis test. The two-way ANOVA was also applied to the neuron-glia experiments and p values in Figure 6 adjusted accordingly.

      (3) Relevant to the second point under minor concerns, I suggest this sentence be removed, as reducing variability and avoiding inhibitory projects are reasons good enough to restrict the analysis to the apical dendrites.

      We have revised the description of the Turrigiano et al., 1998 finding from their Figure 3 and feel it still strengthens the justification for choosing to analyze only synapses on the apical dendrite.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The comments on lines 256-7 could seem misleading - the NASPM results wouldn't rule out contribution of those other subunits, only non-GluA2 containing combinations of those subunits. I would suggest revising this statement. Also, NASPM does likely have an effect, just not one that changes much with TTX treatment.

      At new line 213 (markup) we have added the modifier “homomeric” to clarify our point that the lack of NASPM effect on the increase in mEPSC amplitude after TTX indicates that the increase is not due to more homomeric Ca<sup>2+</sup>-permeable receptors. We have always stated that NASPM reduces mEPSC amplitude, but it is in both control and treated cultures.

      Strong conclusions based on a single culture (lines 314-5) seem unwarranted.

      We have softened this statement with a “suggesting that” substituted for the previous “Therefore,” but stand by our point that the mEPSC amplitude data support a homeostatic effect of TTX in Culture #3, so the lack of increase in GluA2 cluster size needs an explanation other than variability in the homeostatic effect itself.

      Saying (line 554) something is 'the only remaining possibility' also seems unwarranted.

      We have softened this statement to read, “A remaining possibility…”.

      Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC (2002) Control of synaptic strength by glial TNFalpha. Science 295:2282-2285.

      Mansour M, Nagarajan N, Nehring RB, Clements JD, Rosenmund C (2001) Heteromeric AMPA receptors assemble with a preferred subunit stoichiometry and spatial arrangement. Neuron 32:841-853. Stellwagen D, Malenka RC (2006) Synaptic scaling mediated by glial TNF-alpha. Nature 440:1054-1059.

      Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Wang X, Li Y, Engisch KL, Nakanishi ST, Dodson SE, Miller GW, Cope TC, Pinter MJ, Rich MM (2005) Activity-dependent presynaptic regulation of quantal size at the mammalian neuromuscular junction in vivo. J Neurosci 25:343-351.

    1. n short, the argument for a nonzero risk of a paperclip maximizer scenario rests on assumptions that may or may not be true, and it is reasonable to think that research can give us a better idea of whether these assumptions hold true for the kinds of AI systems that are being built or envisioned. For these reasons, we call it a ‘speculative’ risk, and examine the policy implications of this view in Part IV.

      This isn't a real objection

    2. rutiny, and it remains to be seen how much its safety attitude will cost the company.53 53. Jonathan Stempel. 2024. Tesla must face vehicle owners’ lawsuit over self-driving claims. Reuters (May 2024). https://www.reuters.com/legal/tesla-must-face-vehicle-owners-lawsuit-over-self-driving-claims-2024-05-15/. We think that these correlations are causal. Cruise’s license being revoked was a big part of the reason that it fell behind Waymo, and safety was also a factor in Uber’s self-driving failure.54

      I feel like this paper might just be a series of bad analogies

    3. articularly Graphics Processing Units. Computational and cost limits continue to be relevant to new paradigms, including inference-time scaling. New slowdowns may emerge: Recent signs point to a shift away from the culture of open knowledge sharing in the industry.

      Argument: we might get bottlenecked on tech. I don't think so but idk. This isn't really a probability estimate, it's just a vague phrase. I guess the paper isn't really trying to do much realistic forecasting though

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Comments on revisions:

      The authors have addressed most of the previous comments. However, they should clarify the following response:

      *"For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect."*

      It has been reported that the anterior part of the cerebellum may have a lower regenerative capacity compared to the posterior lobe. To avoid potential ambiguity, the authors should clarify that "the phenotype" and "prominent defect" refer to more severe EGL depletion at an earlier stage after IR rather than a poorer regenerative outcome. Additionally, they should provide a reference to support their statement or indicate if it is based on unpublished observations.

      Our comment does not refer to a more severe EGL depletion at an earlier stage. There is instead poorer regeneration of the anterior region. The irradiation approach used provides consistent cell killing of GCPs across the cerebellum. This can be seen in Fig. 1c, e, g, i in our previous publication: Wojcinski, et al. (2017) Cerebellar granule cell replenishment post-injury by adaptive reprogramming of Nestin+ progenitors. Nature Neuroscience, 20:1361-1370). Also, Fig 2e, g, k, m in the paper shows that by P5 and P8, posterior lobule 8 recovers better than anterior lobules 1-5.

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models, including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case. Additional experiments and tools are required to assess mechanisms. Regardless, the data still implicate microglia in the neonatal regenerative response, and this finding remains an important advance.

      As stated previously, the suggested follow up experiments while relevant are extensive and considered beyond the scope of the current paper.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation, and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Weaknesses:

      (1) The diversity of cell types recovered from scRNA-seq libraries of sorted Nes-CFP cells is unexpected, especially the inclusion of minor types such as microglia, meninges, and ependymal cells. The authors should validate whether Nes and CFP mRNAs are enriched in the sorted cells; if not, they should discuss the potential pitfalls in sampling bias or artifacts that may have affected the dataset, impacting interpretation.

      In our previous work, we thoroughly assessed the transgene using RNA in situ hybridization for Cfp, immunofluorescent analysis for CFP and scRNA-seq analysis for Cfp transcripts (Bayin et al., Science Adv. 2021, Fig. S1-2)(1), and characterized the diversity within the NEP populations of the cerebellum. Our present scRNA-seq data also confirms that Nes transcripts are expressed in all the NEP subtypes. A feature plot for Nes expression has been added to the revised manuscript (Fig 1E), as well as a sentence explaining the results. Of note, since this data was generated from FACS-isolated CFP+ cells, the perdurance of the protein allows for the detection of immediate progeny of Nes-expressing cells, even in cells where Nes is not expressed once cells are differentiated. Finally, oligodendrocyte progenitors, perivascular cells, some rare microglia and ependymal cells have been demonstrated to express Nes in the central nervous system; therefore, detecting small groups of these cells is expected (2-4). We have added the following sentence (lines 391-394):

      “Detection of Nes mRNA confirmed that the transgene reflects endogenous Nes expression in progenitors of many lineages, and also that the perdurance of CFP protein in immediate progeny of Nes-expressing cells allowed the isolation of these cells by FACS (Figure 1E)”.

      (2) The authors should de-emphasize that ROS signaling and related gene upregulation exclusively in gliogenic NEPs. Genes such as Cdkn1a, Phlda3, Ass1, and Bax are identified as differentially expressed in neurogenic NEPs and granule cell progenitors (GCPs), with Ass1 absent in GCPs. According to Table S4, gene ontology (GO) terms related to ROS metabolic processes are also enriched in gliogenic NEPs, neurogenic NEPs, and GCPs.

      As the reviewer requested, we have de-emphasized that ROS signaling is preferentially upregulated in gliogenic NEPs, since we agree with the reviewer that there is some evidence for similar transcriptional signatures in neurogenic NEPs and GCPs. We added the following (lines 429-531):

      “Some of the DNA damage and apoptosis related genes that were upregulated in IR gliogenic-NEPs (Cdkn1a, Phlda3, Bax) were also upregulated in the IR neurogenic-NEPs and GCPs at P2 (Supplementary Figure 2B-E).”

      And we edited the last few sentences of the section to state (lines 453-459):

      “Interestingly, we did not observe significant enrichment for GO terms associated with cellular stress response in the GCPs that survived the irradiation compared to controls, despite significant enrichment for ROS signaling related GO-terms (Table S4). Collectively, these results indicate that injury induces significant and overlapping transcriptional changes in NEPs and GCPs. The gliogenic- and neurogenic-NEP subtypes transiently upregulate stress response genes upon GCP death, and an overall increase in ROS signaling is observed in the injured cerebella.”

      (3) The authors need to justify the selection of only the anterior lobe for EGL replenishment and microglia quantification.

      We thank the reviewers for asking for this clarification. Our previous publications on regeneration of the EGL by NEPs have all involved quantification of these lobules, thus we think it is important to stay with the same lobules. For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      “First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect.”

      (4) Figure 1K: The figure presents linkages between genes and GO terms as a network but does not depict a gene network. The terminology should be corrected accordingly.

      We have corrected the terminology and added the following (lines 487-489):

      “Finally, linkages between the genes in differentially open regions identified by ATAC-seq and the associated GO-terms revealed an active transcriptional network involved in regulating cell death and apoptosis (Figure 1K).”

      (5) Figure 1H and S2: The x-axis appears to display raw p-values rather than log10(p.value) as indicated. The x-axis should ideally show -log10(p.adjust), beginning at zero. The current format may misleadingly suggest that the ROS GO term has the lowest p-values.

      Apologies for the mistake. The data represents raw p-values and the x-axis has been corrected.

      (6) Genes such as Ppara, Egln3, Foxo3, Jun, and Nos1ap were identified by bulk ATAC-seq based on proximity to peaks, not by scRNA-seq. Without additional expression data, caution is needed when presenting these genes as direct evidence of ROS involvement in NEPs.

      We modified the text to discuss the discrepancies between the analyses. While some of this could be due to the lower detection limits in the scRNA-seq, it also highlights that chromatin accessibility is not a direct readout for expression levels and further analysis is needed. Nevertheless, both scRNA-seq and ATAC-seq have identified similar mechanisms, and our mutant analysis confirmed our hypothesis that an increase in ROS levels underlies repair, further increasing the confidence in our analyses. Further investigation is needed to understand the downstream mechanisms. We added the following sentence (lines 478-481):

      “However, not all genes in the accessible areas were differentially expressed in the scRNA-seq data. While some of this could be due to the detection limits of scRNA-seq, further analysis is required to assess the mechanisms of how the differentially accessible chromatin affects transcription.”

      (7) The authors should annotate cell identities for the different clusters in Table S2.

      All cell types have been annotated in Table S2.

      (8) Reiterative clustering analysis reveals distinct subpopulations among gliogenic and neurogenic NEPs. Could the authors clarify the identities of these subclusters? Can we distinguish the gliogenic NEPs in the Bergmann glia layer from those in the white matter?

      Thank you for this clarification. As shown in our previous studies, we can not distinguish between the gliogenic NEPs in the Bergmann glia layer and the white matter based on scRNA-seq, but expression of the Bergmann glia marker Gdf10 suggests that a large proportion of the cells in the Hopx+ clusters are in the Bergmann glia layer. The distinction within the major subpopulations that we characterized (Hopx-, Ascl1-expressing NEPs and GCPs) are driven by their proliferative/maturation status as we previously observed. We have included a detailed annotation of all the clusters in Table S2, as requested and a UMAP for mKi57 expression in Fig 1E. We have clarified this in the following sentence (lines 383-385):

      “These groups of cells were further subdivided into molecularly distinct clusters based on marker genes and their cell cycle profiles or developmental stages (Figure 1D, Table S2).”

      (9) In the Methods section, the authors mention filtering out genes with fewer than 10 counts. They should specify if these genes were used as background for enrichment analysis. Background gene selection is critical, as it influences the functional enrichment of gene sets in the list.

      As requested, the approach used has been added to the Methods section of the revised paper. Briefly, the background genes used by the goseq function are the same genes used for the probability weight function (nullp). The mm8 genome annotation was used in the nullp function, and all annotated genes were used as background genes to compute GO term enrichment. The following was added (lines 307-308):

      “The background genes used to compute the GO term enrichment includes all genes with gene symbol annotations within mm8.”

      (10) Figure S1C: The authors could consider using bar plots to better illustrate cell composition differences across conditions and replicates.

      As suggested, we have included bar plots in Fig. S1D-F.

      (11) Figures 4-6: It remains unclear how the white matter microglia contribute to the recruitment of BgL-NEPs to the EGL, as the mCAT-mediated microglia loss data are all confined to the white matter.

      We have thought about the question and had initially quantified the microglia in the white matter and the rest of the lobules (excluding the EGL) separately. However, there are very few microglia outside the white matter in each section, thus it is not possible to obtain reliable statistical data on such a small population. We therefore did not include the cells in the analysis. We have added this point in the main text (line 548).

      “As a possible explanation for how white matter microglia could influence NEP behaviors, given the small size of the lobules and how the cytoarchitecture is disrupted after injury, we think it is possible that secreted factors from the white matter microglia could reach the BgL NEPs. Alternatively, there could be a relay system through an intermediate cell type closer to the microglia.” We have added these ideas to the Discussion of the revised paper (lines 735-738).

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case, but there is no explanation for why further, longer treatment was not attempted nor was there any additional analyses of other regenerative steps in the treated animals. The data still implicate microglia in the neonatal regenerative response, but how remains uncertain.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is an exemplary manuscript.

      The methods and data are very well described and presented.

      I actually have very little to ask the authors except for an explanation of why PLX treatment was discontinued after P5 or P8 and what other steps of NEP reprogramming were assessed in these animals? Was NEP expansion still decreased at P8 even in the presence of PLX at this stage? Also - was there any analysis attempted combining mCAT and PLX?

      We agree with the reviewer that a follow up study that goes into a deeper analysis of the role of microglia in GCP regeneration and any interaction with ROS signaling would interesting. However, it would require a set of tools that we do not currently have. We did not have enough PLX5622 to perform addition experiments or extend the length of treatment. Plexxikon informed us in 2021 that they were no longer manufacturing PLX5622 because they were focusing on new analogs for in vivo use, and thus we had to use what we had left over from a completed preclinical cancer study. We nevertheless think it is important to publish our preliminary results to spark further experiments by other groups.

      References

      (1) Bayin N. S. Mizrak D., Stephen N. D., Lao Z., Sims P. A., Joyner A. L. Injury induced ASCL1 expression orchestrates a transitory cell state required for repair of the neonatal cerebellum. Sci Adv. 2021;7(50):eabj1598.

      (2) Cawsey T, Duflou J, Weickert CS, Gorrie CA. Nestin-Positive Ependymal Cells Are Increased in the Human Spinal Cord after Traumatic Central Nervous System Injury. J Neurotrauma. 2015;32(18):1393-402.

      (3) Gallo V, Armstrong RC. Developmental and growth factor-induced regulation of nestin in oligodendrocyte lineage cells. The Journal of neuroscience : the official journal of the Society for Neuroscience. 1995;15(1 Pt 1):394-406.

      (4) Huang Y, Xu Z, Xiong S, Sun F, Qin G, Hu G, et al. Repopulated microglia are solely derived from the proliferation of residual microglia after acute depletion. Nat Neurosci. 2018;21(4):530-40.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affects the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as means to delay the rise of resistance.

      In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8h treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8 h treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.

      The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistances towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotic tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.

      The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive accumulation of mutations that can confer resistance towards different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.

      Strengths:

      A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.

      In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.

      Weaknesses:

      Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.

      Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.

      Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Comments on revisions:

      Thank you for responding to the concerns raised previously. The manuscript overall has improved.

      We sincerely thank the reviewer for raising this important point. In our initial submission, we acknowledge that our mutation analysis was based on a limited number of replicates (n=6), which may not have been sufficient to robustly distinguish between mutation induction and selection. In response to this concern, we have substantially expanded our experimental dataset. Specifically, we redesigned the mutation rate validation experiment by increasing the number of biological replicates in each condition to 96 independent parallel cultures. This enabled us to systematically assess mutation frequency distributions under four conditions (WT, WT+ampicillin, ΔrecA, ΔrecA+ampicillin), using both maximum likelihood estimation (MLE) and distribution-based fluctuation analysis (new Figure 1F, 1G, and Figure S5).

      These expanded datasets revealed that:

      (1) While the estimated mutation rate was significantly elevated in ΔrecA+ampicillin compared to ΔrecA alone (Fig. 1G),

      (2) The distribution of mutation frequencies in ΔrecA+ampicillin was highly skewed with evident jackpot cultures (Fig. 1F), and

      (3) The observed pattern significantly deviated from Poisson expectations, which is inconsistent with uniform mutagenesis and instead supports clonal selection from an early-arising mutational pool (Fig. S5).

      Importantly, these new results do not contradict our original conclusions but rather extend and refine them. The previous evidence for ROS-mediated mutagenesis remains valid and is supported by our GSH experiments, transcriptomic analysis of oxidative stress genes, and DNA repair pathway repression. However, the additional data now indicate that ROS-induced variants are not uniformly induced after antibiotic exposure but are instead generated stochastically under the stress-prone ΔrecA background and then selectively enriched upon ampicillin treatment.

      Taken together, we now propose a two-step model of resistance evolution in ΔrecA cells (new Figure 5):

      Step i: RecA deficiency creates a hypermutable state through impaired repair and elevated ROS, increasing the probability of resistance-conferring mutations.

      Step ii: β-lactam exposure acts as a selective bottleneck, enriching early-arising mutants that confer resistance.

      We have revised both the Results and Discussion sections to clearly articulate this complementary relationship between mutational supply and selection, and we believe this integrated model better explains the observed phenotypes and mechanistic outcomes.

      Reviewer #2 (Public review):

      This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. The authors employed a modified Adaptive Laboratory Evolution (ALE) workflow to investigate this, initiating the process by diluting an overnight culture 50-fold into an ampicillin selection medium. They present evidence that a recA- strain develops ampicillin resistance mutations more rapidly than the wild-type, as indicated by the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of recA- colonies resistant to ampicillin showed predominant inactivation of genes involved in the multi-drug efflux pump system, contrasting with wild-type mutations that seem to activate the chromosomal ampC cryptic promoter. Further analysis of mutants, including a lexA3 mutant incapable of inducing the SOS response, led the authors to conclude that the rapid evolution of antibiotic resistance occurs via an SOS-independent mechanism in the absence of recA. RNA sequencing suggests that antioxidative response genes drive the rapid evolution of antibiotic resistance in the recA- strain. They assert that rapid evolution is facilitated by compromised DNA repair, transcriptional repression of antioxidative stress genes, and excessive ROS accumulation.

      Strengths:

      The experiments are well-executed and the data appear reliable. It is evident that the inactivation of recA promotes faster evolutionary responses, although the exact mechanisms driving this acceleration remain elusive and deserve further investigation.

      Weaknesses:

      Some conclusions are overstated. For instance, the conclusion regarding the LexA3 allele, indicating that rapid evolution occurs in an SOS-independent manner (line 217), contradicts the introductory statement that attributes evolution to compromised DNA repair.

      We thank the reviewer for this insightful observation, which highlights a central conceptual advance of our study. Our data indeed indicate that resistance evolution in ΔrecA occurs independently of canonical SOS induction (as shown by the lack of resistance in lexA3, dpiBA, and translesion polymerase mutants), yet is clearly associated with impaired DNA repair capacity (e.g., downregulation of polA, mutH, mutY).

      This apparent “contradiction” reflects the dual role of RecA: it functions both as the master activator of the SOS response and as a key factor in SOS-independent repair processes. Thus, the rapid resistance evolution in ΔrecA is not due to loss of SOS, but rather due to the broader suppression of DNA repair pathways that RecA coordinates, which elevates mutational load under stress (This point is discussed in further detail in our response to Reviewer 1).

      The claim made in the discussion of Figure 3 that the hindrance of DNA repair in recA- is crucial for rapid evolution is at best suggestive, not demonstrative. Additionally, the interpretation of the PolI data implies its role, yet it remains speculative.

      We appreciate this comment and would like to respectfully clarify that our conclusion regarding the role of DNA repair impairment is supported by several independent lines of mechanistic evidence.

      First, our RNA-seq analysis revealed transcriptional suppression of multiple DNA repair genes in ΔrecA cells following ampicillin treatment, including polA (DNA Pol I) and the base excision repair genes mutH, mutY, and mutM (Fig. 4K). This indicates that multiple repair pathways, including those responsible for correcting oxidative DNA lesions, are downregulated under these conditions.

      Second, we observed a significant reduction in DNA Pol I protein expression as well as reduced colocalization with chromosomal DNA in ΔrecA cells, suggesting impaired engagement of repair machinery (Fig. 3C-E). These phenotypes are not limited to transcriptional signatures but extend to functional protein localization.

      Third, and most importantly, resistance evolution was fully suppressed in ΔrecA cells upon co-treatment with glutathione (GSH), which reduces ROS levels. As GSH did not affect ampicillin killing (Fig. 4J), these findings suggest that mutagenesis and thus the emergence of resistance requires both ROS accumulation and the absence of efficient repair.

      Therefore, we believe these data go beyond correlation and demonstrate a mechanistic role for DNA repair impairment in driving stress-associated resistance evolution in ΔrecA. We have revised the Discussion to emphasize the strength of this evidence while avoiding overstatement.

      In Figure 2A table, mutations in amp promoters are leading to amino acid changes.

      We thank the reviewer for spotting this inconsistency. Indeed, the ampC promoter mutations we identified reside in non-coding regulatory regions and do not result in amino acid substitutions. We have corrected the annotation in Fig. 2A and clarified in the main text that these mutations likely affect gene expression through transcriptional regulation, rather than protein sequence alteration.

      The authors' assertion that ampicillin significantly influences persistence pathways in the wild-type strain, affecting quorum sensing, flagellar assembly, biofilm formation, and bacterial chemotaxis, lacks empirical validation.

      We thank the reviewer for pointing this out. In the original version, we acknowledged transcriptional enrichment of genes related to quorum sensing, flagellar assembly, and chemotaxis in the wild-type strain upon ampicillin treatment. However, as we did not directly assess persistence phenotypes (e.g., biofilm formation or persister levels), we agree that such functional inferences were not fully supported. We have revised the relevant statements to focus solely on transcriptomic changes and have removed language suggesting direct effects on persistence pathways.

      Figure 1G suggests that recA cells treated with ampicillin exhibit a strong mutator phenotype; however, it remains unclear if this can be linked to the mutations identified in Figure 2's sequencing analysis.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Reviewer #3 (Public review):

      In the present work, Zhang et al investigate involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term (transient) drug resistance evolution can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by accumulation of reactive oxygen species and compromised DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance).

      Strengths:

      The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. Antibiotic resistance evolution under transient stress is poorly studied, so the authors' work is a nice mechanistic contribution to this field.

      Weaknesses:

      The authors do not show any direct evidence of altered mutation rate or accumulated DNA damage in their model.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest two minor changes to the text.

      (1) Re. WGS data.

      The authors write in their response "We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      I think the source of my confusion stemmed from this part in the text:

      "In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations developed over time to provide high levels of resistance (29). To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we..."

      I would change the phrase "verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain" to "identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin." This would explicitly state what the sequencing was for (ie. ID-ing mutations). The current phrase can give the impression that WGS was used to validate rapid or high mutagenesis.

      Thanks for this suggestion. We have revised this description to “In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations that can arise stochastically and, under stress conditions, become enriched through selection over time to confer high levels of resistance (33). Having observed a non-random and right-skewed distribution of mutation frequencies in ΔrecA isolates following ampicillin exposure, we next sought to determine whether specific resistance-conferring mutations were enriched in ΔrecA isolates following antibiotic exposure.”

      (2) Re. whether the mutations are "induced" or "pre-existing."

      The authors write:

      "We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics."

      I think it is important to discuss the negative data for the other antibiotics (along with the other points made in your Reviewer response) in the main text.

      This point is discussed in further detail in our response to Reviewer 1 (Public Review).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      n this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment.

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review):

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy.

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      This study assesses eRNA activity as a classifier of different subtypes of breast cancer and as a prognosis tool. The authors take advantage of previously published RNA-seq data from human breast cancer samples and assess it more deeply, considering the cancer subtype of the patient. They then apply two machine learning approaches to find which eRNAs can classify the different breast cancer subtypes. While they do not find any eRNA that helps distinguish ductal vs. lobular breast cancers, their approach helps identify eRNAs that distinguish luminal A, B, basal and Her2+ cancers. They also use motif enrichment analysis and ChIP-seq datasets to characterize the eRNA regions further. Through this analysis, they observe that those eRNAs where ER binds strongest are associated with a poor patient prognosis.

      Major comments

      • Part of the rationale for this study is the previous observation that eRNAs are less associated with the prognosis of breast cancer patients in comparison to mRNAs and they claim that the high heterogeneity between breast cancer subtypes would mask the importance of eRNAs. In this study, the authors solely focus on eRNAs as a classification of breast cancer subtypes and prognostic tool and do not answer whether eRNAs or mRNAs are a better predictor of cancer subtypes and of prognosis. Since the answer and the tools are already in their hands, it would be important to also see a comparative analysis where they assess which of the two (mRNAs or eRNAs) is a better predictor.
      • The authors run the umaps of Fig. 1C only taking the predictor eRNAs. It is then somewhat expected to observe a separation. Coming from a single-cell omics field, what I would suggest is to take the eRNA loci and compute a umap with the highly variable regions, perform clustering on it and assess how the cancer subtypes are structured within the data. This would give a first overview of how much segregation and structure one can have with this data. Having a first step of data exploration would also strengthen the paper. If the authors have tried it, could the authors comment on it?
      • 'neither measures could classify any distinct eRNAs for invasive ductal vs lobular cancer samples' S1B. Just by eye, I can see a potential enrichment of ductal on the left and on the right while lobular stays in the center. This suggests to me that, while perhaps each eRNA alone does not have the power to classify the lobular vs ductal subtype, perhaps there is a difference - which could result from a cooperative model of eRNA influence - that would need further exploration. Would a PCA also show enrichments of ductal vs. lobular in specific parts of the plot? It may be worth exploring the PC loadings to see which eRNAs could play an influence. In this regard, a more unbiased visual examination, as suggested in my previous point, could help clarify whether there could be an association of certain eRNAs that cannot be captured by ML.
      • "we employed machine learning approaches on 302,951 eRNA loci identified from RNA-seq datasets from 1,095 breast cancer patient samples from previous studies" - the previous studies from which the authors take the data [11,12] highlight the presence of ~60K enhancers in the human genome and they use less than that in their analysis. Could the authors please clarify the differences in numbers with previous studies and give a reasoning? Also, from the methods section, they discard many patient samples due to low QC, so, from what I understand, the number of samples analyzed in the end is 975 and not 1,095.

      Minor comments

      • Can the authors please state the parameters of the umap in methods? Although it could be intrinsic to the dataset, data points are grouped in a way that makes me think that the granularity is too forced. Could the authors please show how the umap would behave with more lenient parameters? Or even with PCA?
      • 'Majority of the basal' -> The majority of the basal.

      Significance

      This is a paper relevant in the cancer field, particularly for breast cancer research. The significance of the paper lies in digging into the breast cancer samples, taking the different existing subtypes into account to assess the contribution of eRNAs as a classifier and as a prognostic tool. The data is already available but it has not been studied to this degree of detail. It highlights the importance of characterizing cancer samples in more depth, considering its intrinsic heterogeneity, as averaging across different subtypes would mask biology. My expertise lies in gene regulation and single-cell omics. My contribution will therefore be more focused on the analysis and extraction of biological information. The extent of its specific relevance in cancer research falls beyond my expertise.

    1. Before we talk about public criticism and shaming and adults, let’s look at the role of shame in childhood. In at least some views about shame and childhood[1], shame and guilt hold different roles in childhood development [r1]: Shame is the feeling that “I am bad,” and the natural response to shame is for the individual to hide, or the community to ostracize the person. Guilt is the feeling that “This specific action I did was bad.” The natural response to feeling guilt is for the guilty person to want to repair the harm of their action. In this view [r1], a good parent might see their child doing something bad or dangerous, and tell them to stop. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection). The parent may then comfort the child to let the child know that they are not being rejected as a person, it was just their action that was a problem. The child’s relationship with the parent is repaired, and over time the child will learn to feel guilt instead of shame and seek to repair harm instead of hide.

      I find the contrast between shame and guilt to be particularly illuminating, especially in the context of parenting. It made me think about how my own parents treated discipline. When I was younger and did something wrong, I recall them emphasizing on what I did rather than characterizing me as a "bad kid"—which corresponds to the concept of encouraging guilt over shame. That type of answer taught me to accept responsibility and correct my actions rather than feeling useless. I'm curious, though, how this strategy would change across cultures where shame is employed more intentionally as a weapon for social conformity.

    2. 18.1. Shame vs. Guilt in childhood development# Before we talk about public criticism and shaming and adults, let’s look at the role of shame in childhood. In at least some views about shame and childhood[1], shame and guilt hold different roles in childhood development [r1]: Shame is the feeling that “I am bad,” and the natural response to shame is for the individual to hide, or the community to ostracize the person. Guilt is the feeling that “This specific action I did was bad.” The natural response to feeling guilt is for the guilty person to want to repair the harm of their action. In this view [r1], a good parent might see their child doing something bad or dangerous, and tell them to stop. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection). The parent may then comfort the child to let the child know that they are not being rejected as a person, it was just their action that was a problem. The child’s relationship with the parent is repaired, and over time the child will learn to feel guilt instead of shame and seek to repair harm instead of hide.

      When parents criticize their children's bad actions but still show love and patience, their kids can learn to fix mistakes instead of feeling worthless. Moreover, I believe shame can make kids hide or feel negative, which is bad for their development, but guilt can teach them to take responsibility in their future life. Hence, I think a good parenting style should focus on shaping kids' behavior instead of only blaming them, which can help their children build confidence and kindness.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript the authors have done cryo-electron tomography of the manchette, a microtubule-based structure important for proper sperm head formation during spermatogenesis. They also did mass-spectrometry of the isolated structures. Vesicles, actin and their linkers to microtubules within the structure are shown.

      __We thank the reviewer for the critical reading of our manuscript; we have implemented the suggestions as detailed below, which we believe indeed improved the manuscript. __

      Major:

      The data the conclusions are based on seem very limited and sometimes overinterpreted. For example, only one connection between actin and microtubules was observed, and this is thought to be MACF1 simply based on its presence in the MS.

      __We regret giving the impression that the data is limited. We in fact collected >100 tilt series from 3 biological replicas for the isolated manchette. __

      __In the revised version, we added data from in-situ studies showing vesicles interacting with the manchette (as requested below, new Fig. 1). __

      Specifically, for the interaction of actin with microtubule we added more examples (Revised Fig. 6) and we toned down the discussion related to the relevance of this interaction (lines 193-194, 253-255). MACF1 is mentioned only as a possible candidate in the discussion (line 254).

      Another, and larger concern, is that the authors do a structural study on something that has been purified out of the cell, a process which is extremely disruptive. Vesicles, actin and other cellular components could easily be trapped in this cytoskeletal sieve during the purification process and as such, not be bona fide manchette components. This could create both misleading proteomics and imaging. Therefore, an approach not requiring extraction such as high-pressure freezing, sectioning and room-temperature electron tomography and/or immunoEM on sections to set aside this concern is strongly recommended. As an additional bonus, it would show if the vesicles containing ATP synthase are deformed mitochondria.

      __We recognise the concern raised by the reviewer. __

      __To alleviate this concern, we added imaging data of manchettes in-situ that show vesicles, mitochondria and filaments interacting with the manchette (new Fig. 1), essentially confirming the observations that were made on the isolated manchette. __

      __The benefits of imaging the isolated manchette were better throughput (being able to collect more data) and reaching higher resolution allowing to resolve unequivocally the dynein/dynactin and actin filaments. __

      Minor: Line 99: "to study IMT with cryo-ET, manchettes were isolated ...(insert from which organism)..."

      __Added in line 102 in the revised version. __

      Line 102 "...demonstrating that they can be used to study IMT".. can the authors please clarify?

      This paragraph was revised (lines 131-137), we hope it is now more clear.

      Line 111 "densities face towards the MT plus-end" How can a density "face" anywhere? For this, it needs to have a defined front and back.

      Microtubule motor proteins (kinesin and dynein) are often attached to the microtubules with an angle and dynactin and cargo on one side (plus end). We rephrased this part and removed the word “face” in the revised version to make it more clear (lines 161-162).

      Line 137: is the "perinuclear ring" the same as the manchette?

      The perinuclear ring is the apical part of the manchette that connects it to the nucleus. We added to the revised version imaging of the perinuclear ring with observations on how it changes when the manchette elongates (new Fig. 2).

      Figure 2B: How did the authors decide not to model the electron density found between the vesicle and the MT at 3 O'clock? Is there no other proteins with a similar lollipop structure as ATP synthase, so that this can be said to be this protein with such certainty?

      __The densities connecting the vesicles to the microtubules shown in (now) Fig. 4D are not consistent enough to be averaged. __

      __The densities resembling ATP synthase are inside the vesicles. Nevertheless, we have decided to remove the averaging of the ATP synthases from the revised manuscipt as they are not of great importance for this manuscript. Instead, the new in-situ data clearly show mitochondria (with their characteristic double membrane and cristae) interacting with manchette microtubule (new Fig 1C). __

      Line 189: "F-actin formed organized bundles running parallel to mMTs" - this observation needs confirming in a less disrupted sample.

      __Phalloidin (actin marker) was shown before to stain the manchette (PMID: 36734600). As actin filaments are very thin (7 nm) they are very hard to observe in plastic embedded EM. __

      In the in-situ data we added to the revised manuscipt (new Fig 1D), we observe filaments with a diameter corresponding to actin. In addition, we added more examples of microtubules interacting with actin in isolated manchette (new Fig. 6 E-K).

      Line 242 remove first comma sign.

      Removed.

      Line 363 "a total of 2 datasets" - is this manuscript based on only two tilt-series? Or two datasets from each of the 4 grids? In any case, this is very limited data.

      We apologise for not clearly providing the information about the data size in the original manuscipt. The data is based on three biological replicas (3 animals). We collected more than 100 tomograms of different regions of the manchettes. As such, we would argue that the data is not limited per se.

      Reviewer #1 (Significance (Required)):

      The article is very interesting, and if presented together with the suggested controls, would be informative to both microtubule/motorprotein researchers as well as those trying studying spermatogenesis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manchette appears as a shield-like structure surrounding the flagellar basal body upon spermiogenesis. It consists of a number of microtubules like a comb, but actin (Mochida et al. 1998 Dev. Biol. 200, 46) and myosin (Hayasaka et al. 2008 Asian J. Androl. 10, 561) were found, suggesting transportation inside the manchette. Detailed structural information and functional insight into the manchette was still awaited. There is a hypothesis called IMT (intra-machette transport) based on the fact that machette and IFT (intraflagellar transport) share common components (or homologues) and on their transition along the stages of spermiogenesis. While IMT is considered as a potential hypothesis to explain delivery of centrosomal and flagellar components, no one has witnessed IMT at the same level as IFT. IMT has never been purified, visualized in motion or at high resolution. This study for the first time visualized manchette using high-end cryo-electron tomography of isolated manchettes, addressing structural characterization of IMT. The authors successfully microtubular bundles, vesicles located between microtubules and a linker-like structure connecting the vesicle and the microtubule. On multilamellar membranes in the vesicles they found particles and assigned them to ATPase complexes, based on intermediate (~60A) resolution structure. They further identified interesting structures, such as (1) particles on microtubules, which resemble dynein and (2) filaments which shows symmetry of F-actin. All the molecular assignments are consistent with their proteomics of manchettes.

      __We thank the reviewer for highlighting the novelty of our study.____ __

      Their assignment of ATPase will be strengthened by MS data, if it proves absence of other possible proteins forming such a membrane protein complex.

      All the ATPase components were indeed found in our proteomics data. Nevertheless, we have decided to remove the averaging of the ATPase as it does not directly relate to IMT, the focus of this manuscript.

      They discussed possible role of various motor proteins based on their abundance (Line 134-151, Line 200). This makes sense only with a control. Absolute abundance of proteins would not necessarily present their local importance or roles. This reviewer would suggest quantitative proteomics of other organelles, or whole cells, or other fractions obtained during manchette isolation, to demonstrate unique abundance of KIF27 and other proteins of their interest.

      We agree with the reviewer that absolute abundance does not necessarily indicate importance or a role. As such, we removed this part of the discussion from the revised manuscript.

      A single image from a tomogram, Fig.6B, is not enough to prove actin-MT interaction. A gallery and a number (how many such junctions were found from how many MTs) will be necessary.

      We agree that one example is not enough. In the new Fig. 6E-K, we provide a gallery of more examples. We have revised the text to reflect the point that these observations are still rare and more data will be needed to quantify this interaction (Lines 253-254).

      Minor points: Their manchette purification is based on Mochida et al., which showed (their Fig.2) similarity to the in vivo structure (for example, Fig.1 of Kierszenbaum 2001 Mol. Reproduc. Dev. 59, 347). Nevertheless, since this is not a very common prep, it is helpful to show the isolated manchette’s wide view (low mag cryo-EM or ET) to prove its intactness.

      We thank the reviewer for this suggestion, in the revised version, new Fig. 2 provides a cryo-EM overview of purified manchette from different developmental stages.

      Line 81: Myosin -> myosin (to be consistent with other protein names)

      Corrected.

      This work is a significant step toward the understanding of manchettes. While the molecular assignment of dynein and ATPase is not fully decisive, due to limitation of resolution (this reviewer thinks the assignment of actin filament is convincing, based on its helical symmetry), their speculative model still deserves publication.

      Reviewer #2 (Significance (Required)):

      This work is a significant step toward the understanding of manchettes. While the molecular assignment of dynein and ATPase is not fully decisive, due to limitation of resolution (this reviewer thinks the assignment of actin filament is convincing, based on its helical symmetry), their speculative model still deserves publication.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      ->Summary:

      The manchette is a temporary microtubule (MT)-based structure essential for the development of the highly polarised sperm cell. In this study, the authors employed cryo-electron tomography (cryo-ET) and proteomics to investigate the intra-manchette transport system. Cryo-EM analysis of purified rat manchette revealed a high density of MTs interspersed with actin filaments, which appeared either bundled or as single filaments. Vesicles were observed among the MTs, connected by stick-like densities that, based on their orientation relative to MT polarity, were inferred to be kinesins. Subtomogram averaging (STA) confirmed the presence of dynein motor proteins. Proteomic analysis further validated the presence of dynein and kinesins and showed the presence of actin crosslinkers that could bundle actin filaments. Proteomics data also indicated the involvement of actin-based transport mediated by myosin. Importantly, the data indicated that the intraflagellar transport (IFT) system is not part of the intra-manchette transport mechanism. The visualisation of motor proteins directly from a biological sample represents a notable technical advancement, providing new insights into the organisation of the intra-manchette transport system in developing sperm.

      We thank the reviewer for summarising the novelty of our observations.

      -> Are the key conclusions convincing? Below we comment on three main conclusions. MT and F-actin bundles are both constituents of the manchette While the data convincingly shows that MT and F-actin are part of the manchette, one cannot conclude from it that F-actin is an integral part of the manchette. The authors would need to rephrase so that it is clear that they are speculating.

      We have rephrased our statements and replaced “integral” with ‘actin filaments are associated’. Of note previous studies suggested actin are part of the manchette including staining with phalloidin (PMID: 36734600, PMID: 9698455, PMID: 18478159) and we here visualised the actin in high resolution.

      The transport system employs different transport machinery on these MTs Proteomics data indicates the presence of multiple motor proteins in the manchette, while cryo-EM data corroborates this by revealing morphologically distinct densities associated with the MTs. However, the nature of only one of these MT-associated densities has been confirmed-specifically, dynein, as identified through STA. The presence of kinesin or myosin in the EM data remains unconfirmed based on just the cryo-ET density, and therefore it is unclear whether these proteins are actively involved in cargo transport, as this cannot be supported by just the proteomics data. In summary, we recommend that the authors rephrase this conclusion and avoid using the term "employ".

      We agree that our cryo-ET only confirmed the motor protein dynein. As such, we removed the term employ and rephrased our claims regarding the active transport and accordingly changed the title.

      Dynein mediated transport (Line 225-227) The data shows that dynein is present in the manchette; however, whether it plays and active role in transport cannot be determined from the cryo-ET data provided in the manuscript, as it does not clearly display a dynein-dynactin complex attached to cargo. The attachment to cargo is also not revealed via proteomics as no adaptor proteins that link dynein-dynactin to its cargo have been shown.

      A list of cargo adaptor proteins were found in our proteomics data but we agree that cryo-ET and proteomics alone cannot prove active transport. As such we toned down the discussion about active transport (lines 212-220).

      -> Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      F-actin • In the abstract, the authors state that F-actin provides tracks for transport as well as having structural and mechanical roles. However, the manuscript does not include experiments demonstrating a mechanical role. The authors appear to base this statement on literature where actin bundles have been shown to play a mechanical role in other model systems. We suggest they clarify that the mechanical role the authors suggest is speculative and add references if appropriate.

      __ ____We removed the claim about the mechanical role of the actin from the abstract and rephrased this in the discussion to suggest this role for the F-actin (lines 242-243).__

      • Lines 15,92, 180 and 255: The statement "Filamentous actin is an integral part of the manchette" is misleading. While the authors show that F-actin is present in their purified manchette structures, whether it is integral has not been tested. Authors should rephrase the sentence.

      We removed the word integral.

      • To support the claim that F-actin plays a role in transport within the manchette, the authors present only one instance where an unidentified density is attached to an actin filament. This is insufficient evidence to claim that it is myosin actively transporting cargo. Although the proteomics data show the presence of myosin, we suggest the authors exercise more caution with this claim.

      We agree that our data do not demonstrate active transport as such we removed that claim. We mention the possibility of cargo transport in the discussion (lines 250-255).

      • The authors mention the presence of F-actin bundles but do not show direct crosslinking between the F-actin filaments. They could in principle just be closely packed F-actin filaments that are not necessarily linked, so the term "bundle" should be used more cautiously.

      We do not assume that a bundle means that the F-actin filaments are crosslinked. A bundle simply indicates the presence of multiple F-actin filaments together. We rephrased it to call them actin clusters.

      Observations of dynein • Relating to Figure 2B: From the provided image it is not clear whether the density corresponds to a dynein complex, as it does not exhibit the characteristic morphological features of dynein or dynactin molecules.

      We indeed do not claim that the densities in this figure are dynein or dynactin. __We revised this paragraph and hope that it is now more clear (lines 135-137). __

      • Lines 171-172 and Figure 4: It is well established that dynein is a dimer and should always possess two motor domains. The authors have incorrectly assumed they observed single motor heads, except possibly in Figure 4A (marked by an arrow). In all other instances, the dynein complexes show two motor domains in proximity, but these have not been segmented accurately. Furthermore, the "cargos" shown in grey are more likely to represent dynein tails or the dynactin molecule, based on comparisons with in vitro structures of these complexes (see references 1-3).

      We thank the reviewer for this correction. We improved the annotations in the figure and revised the text to clarify that we identified dimers of dynein motor heads (lines 140-144). We further added a projection of a dynein dynactin complex to compare to the observation on the manchette (new Fig. 5E). We further changed claims on the presence of protein cargo to the presence of dynein/dynactin that allows cargo tethering based on the presence of cargo adaptors in the proteomics data.

      • Lines 21, 173, and 233 mention cargos, but as noted above, it seems to be parts of the dynein complex the authors are referring to.

      This was corrected as mentioned above.

      • Panel 4B appears to show a dynein-dynactin complex, but whether there is a cargo is unclear and if there is it should be labelled accordingly. To assessment of whether there is any cargo bound to the dynein-dynactin complex a larger crop of the panel would be helpful In summary, we recommend that the authors revisit their segmentations in Figures 2B and 4, revise their text based on these observations, and perform quantification of the data (as suggested in the next section).

      We thank the reviewers for sharing their expertise on dynein-dynactin complexes. We have revised the text as detailed above and excluded the assignment of any cargo, as we cannot (even from larger panels) see a clear association of cargo. We have made clear that we only refer to dynein dynactin with the capability of linking cargo based on the presence of proteomics data. We have removed claims on active transport with dynein.

      Dynein versus kinesin-based transport The calculation presented in lines 147-151 does not account for the fact that both the dynein-dynactin complex and kinesin proteins require cargo adaptors to transport cargo. Additionally, the authors overlook the possibility that multiple motors could be attached to a single cargo. If the authors did not observe this, they should explicitly mention it to support their argument. In short, the calculations are based on an incorrect premise, rendering the comparison inaccurate. Unless the authors have identified any dynein-dynactin or kinesin cargo adaptors in their proteomics data which could be used for such a comparison, we believe the authors lack sufficient data to accurately estimate the "active transport ratio" between dynein and kinesin.

      Even though we detect cargo adaptors in our proteomics, we agree that calculating relative transport based only on the proteomics can be inaccurate as such we removed absolute quantification and comparison between dynein and kinesin-based IMT.

      • Would additional experiments be essential to support the claims of the paper?

      F-actin distance and length distribution • To support the claim that F-actin is bundled (line 189), could the authors provide the distance between each F-actin filament and its neighbours? Additionally, could they compare the average distance to the length of actin crosslinkers found in their proteomics data, or compare it to the distances between crosslinked F-actin observed in other research studies?

      We measured distances between the actin filaments and added a plot to new Fig 6.

      • While showing that F-actin is important for the manchette would require cellular experiments, authors could provide quantification of how frequently these actin structures are observed in comparison to MTs to support their claims that these actin filaments could be important for the manchette structure.

      We agree that claims on the role and function of actin in the manchette require cellular experiments that are beyond the scope of this study. Absolute quantification of the ratio between MTs and actin from cryoET is very hard and will be inaccurate as the manchette cannot be imaged as a whole due to its size and thickness. The ratio we have is based on the relative abundance provided by the proteomics (Fig. 5F).

      • In line 193, the authors claim that the F-actin in bundles appears too short for transport. Could they provide length distributions for these filaments? This might provide further support to their claim that individual F-actin filaments can serve as transport tracks (line 266).

      __In addition to the limitation mentioned in the previous point, quantification of length from high magnification imaging will likely be inaccurate as the length of the actin in most cases is bigger than the field of view that is captured. Nevertheless, we removed the claim about the actin being too short for transport. __

      • Could the authors also quantify the abundance of individual F-actin filaments observed, compared to MTs and F-actin bundles, to support the idea that they could play a role in transport?

      As explained for the above points absolute quantification of the ratio between MTs and actin is not feasible from cryoET data that cannot capture all of the manchette in high enough resolution to resolve the actin.

      • In the discussion, the authors mention "interactions between F-actin singlets and mMTs" (line 269), yet they report observing only one instance of this interaction (lines 210 and 211). Given the limited data, they should refer to this as a single interaction in the discussion. The scarcity of data raises questions about how representative this event truly is.

      We agree that one example is not enough. In the new Fig. 6E-K, we provide a gallery of more examples as also requested by reviewers 1 and 2. We have also revised the text to reflect the point that these observations are still rare (Lines 190-194).

      Quantifications for judgement of representativity The authors should quantify how often they observed vesicles with a stick-like connection to MTs (lines 106-107); this would strengthen the interpretation of the density, as currently only one example is shown in the manuscript (Figure 4A). If possible, they could show how many of them are facing towards the MT plus end.

      __As mentioned in the text (lines 135-137), the linkers connecting vesicles to MTs were irregular and so we could not interpret them further this is in contrast to dynein that were easily recognisable but were not associated with vesicles. __

      Dynein quantifications • The authors are recommended to quantify how many dynein molecules per micron of MT they observe and how often they are angled with their MT binding domain towards the minus-end.

      As the manchette is large and highly dense any quantification will likely be biased towards parts of the manchette that are easier to image, for example the periphery. As such we do not think quantifying the dynein density will yield meaningful insight.

      • Could the authors quantify how many dynein densities they found to be attached to a (vesicle) cargo, if any (line 175)? They could show these observations in a supplementary figure.

      We did not observe any case of a connection between a vesicle and dynein motors, we edited this sentence to be more clear on that.

      • For densities that match the size and location of dynein but lack clear dynein morphology (as seen in Figure 2B), could the authors quantify how many are oriented towards the MT minus end?

      We had many cases where the connection did not have a clear dynein morphology, and as the morphology is not clear, it is impossible to make a claim about whether they are oriented towards the minus end.

      Artefacts due to purification: Authors should discuss if the purification could have effects on visualizing components of the manchette. For example, if it has effect on the MTs and actin structure or the abundance/structure of the motor protein complexes (bound to cargo or isolated).

      We have followed a protocol that was published before and showed the overall integrity of the manchette. Nevertheless, losing connections between manchette and other cellular organelles are expected. To address this point, we added in-situ data (new Fig 1) showing manchette in intact spermatids interacting with vesicles and mitochondria, as well as overviews of manchettes (new Fig 2), the text was revised accordingly.

      • Are the experiments adequately replicated and statistical analysis adequate? The cryo-ET data presented in the manuscript is collected using two separate sample preparations. Along with the quantifications of the different observations suggested above which will help the reader assess how abundant and representative these observations are, the authors could further strengthen their claims by acquiring data from a third sample preparation and then analysing how consistent their observations are between different purifications. This however could be time consuming so it is not a major requirement but recommended if possible within a short time frame.

      We regret not explicitly mentioning our data set size, it was added now to the revised version. In essence, the data is based on three biological replicas (3 animals). We collected more than 100 tomograms of different regions of the manchettes. We provided in the revised version more observations (new Fig 1, 2, 4B-C and 6E-K).

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Most of the comments deal with either modifying the text or analysing the data already presented, so the revision could be done with 1-3 months.


      Minor comments: - Specific experimental issues that are easily addressable. 1) Could the authors state how many tilt series were collected for each dataset/independent sample preparation? We recommend that they upload their raw data or tomograms to EMPAIR.

      We added this information in the material and methods.

      2) It is not clear to me if the same sample was used for cryo-ET and proteomics. Could the authors clarify how comparable the sample preparation for the cryo-ET and proteomics data is or if the same sample was used for both. If there is a discrepancy between these preparations, they would need to discuss how this can affect comparing observations from cryo-ET and mass spectrometry. Ideally both samples should be the same.

      After sample preparation the manchettes were directly frozen on grids. The rest of the samples was used for proteomics. Consequently, EM and MS data were acquired on the same samples. We clarified this in the text (lines 327-328).

      • Are prior studies referenced appropriately? We recommend including additional references to support the claim that F-actin has a mechanical role (line 242). Could the authors compare their proteomics data to other mass spectrometry studies conducted on the Manchette (for example, see reference 4)?

      We added the comparison but it is important to point out that in reference 4 the manchettes were isolated from mice testes.

      • Are the text and figures clear and accurate? Text: We do not see the necessity of specifying the microtubules (MTs) in the data as "manchette MTs" or "mMTs" rather than simply "MTs". However, we recommend that the authors use either "MT" or "mMT" consistently throughout the manuscript.

      We changed to only MTs.

      The authors appear to refer to both dynein-1 (cytoplasmic dynein) and dynein-2 (axonemal dynein or IFT dynein). To avoid confusion, it is important that the authors clearly specify which dynein they are referring to throughout the text. This is particularly relevant as the study aims to demonstrate that IFT is not part of the manchette transport system.

      • Introduction: In the third paragraph (lines 59-75), the authors should specify that they are referring to dynein-2, which is distinct from cytoplasmic dynein discussed in the previous paragraph (lines 44-58).

      We specify the respective dyneins in the text (line 66,140-141,145).

      • Figure 4D: The authors could fit a dynein-1 motor domain instead of a dynein-2 into the density to stay consistent with the fact that the density belongs to cytoplasmic dynein-1.

      __We changed the figure and fitted a cytosolic dynein-1 structure (5nvu) instead. __

      Figures: • Figure 2B: The legend mentions a large linker complex; however, this may correspond to two or three separate densities.

      We have addressed this and changed the wording.

      • Figure 4: please revisit the segmentation of this whole figure based on previous comments.

      __We revised as suggested. __

      • Figures 1, 2, 4, 5, and 6: It would be helpful to state in the legends that the tomograms are denoised. There are stripe-like densities visible in the images (e.g., in the vesicle in Figure 2B). Do these artefacts also appear in the raw data?

      As stated in the Methods section, tomograms were generally denoised with CryoCare for visualisation purposes. The “stripe-like densities” are artefacts of the gold fiducials used for tomogram alignment and appear in the raw data (before denoising).

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? We suggest revising the paragraph title "Dynein-mediated cargo along the manchette" (line 165) to "Dynein-mediated cargo transport along the manchette".

      __We have changed this in the revised version. __

      We recommend that the authors provide additional evidence to support the interpretation that the observed EM densities correspond to motor proteins. Specifically: • Include scale bars or reference lines indicating the known dimensions of motor proteins, based on previous data, to demonstrate that the observed densities match the expected size.

      The dynein structure is provided for reference. We also added the cytosolic dynein–dynactin as a reference (Fig 5E).

      • Make direct comparisons to existing EM data and highlight morphological similarities.

      We have added a comparison to existing data (Fig 5E).

      In the discussion (lines 249-254), the authors could speculate on alternative roles for the IFT components in the manchette, particularly if they are not part of the IFT trains. We also suggest rephrasing the claim in line 266 to make it more speculative in tone.

      __We have addressed this in the revised version (lines 221-230). __

      Finally, a schematic overview of the manchette ultrastructure in a spermatid would greatly aid the reader in understanding the material presented.

      We now include a graphical abstract and overviews of isolated manchettes on cryo-EM grids.

      References: 1. Chowdhury, S., Ketcham, S., Schroer, T. et al. Structural organization of the dynein-dynactin complex bound to microtubules. Nat Struct Mol Biol 22, 345-347 (2015). https://doi.org/10.1038/nsmb.2996

      1. Grotjahn, D.A., Chowdhury, S., Xu, Y. et al. Cryo-electron tomography reveals that dynactin recruits a team of dyneins for processive motility. Nat Struct Mol Biol 25, 203-207 (2018). https://doi.org/10.1038/s41594-018-0027-7

      2. Chaaban, S., Carter, A.P. Structure of dynein-dynactin on microtubules shows tandem adaptor binding. Nature 610, 212-216 (2022).https://doi.org/10.1038/s41586-022-05186-y

      3. W. Hu, R. Zhang, H. Xu, Y. Li, X. Yang, Z. Zhou, X. Huang, Y. Wang, W. Ji, F. Gao, W. Meng, CAMSAP1 role in orchestrating structure and dynamics of manchette microtubule minus-ends impacts male fertility during spermiogenesis, Proc. Natl. Acad. Sci. U.S.A. 120 (45) e2313787120, https://doi.org/10.1073/pnas.2313787120 (2023).

      Reviewer #3 (Significance (Required)):

      This study employs cryo-electron tomography (cryo-ET) and proteomics to elucidate the architecture of the manchette. It advances our understanding of the components involved in intracellular transport within the manchette and introduces the following technical and conceptual innovations:

      a) Technical Advances: The authors have visualized the manchette at high resolution using cryo-ET. They optimized a purification pipeline capable of retaining, at least partially, the transport machinery of the manchette. Notably, they observed dynein and putative kinesin motors attached to microtubules-a significant achievement that, to our knowledge, has not been reported previously.

      b) Conceptual Advances: This study provides novel insights into spermatogenesis. The findings suggest that intraflagellar transport (IFT) is unlikely to play a role at this stage of sperm development while shedding light on alternative transport systems. Importantly, the authors demonstrate that actin filaments organize in two distinct ways: clustering parallel to microtubules or forming single filaments.

      This work is likely to be of considerable interest to researchers in sperm development and structural biology. Additionally, it may appeal to scientists studying motor proteins and the cytoskeleton.

      We thank the reviewers for appreciating the significance and novelty of our study.

      The reviewers possess extensive expertise in in situ cryo-electron tomography and single-particle microscopy, including work on dynein-based complexes. Collectively, they have significant experience in the field of cytoskeleton-based transport.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.

      First of all, we would like to thank the reviewer for their comprehensive feedback, and their insightful comments. Importantly, as you point out, our goal with this model was to build a minimal model of place cell representations, where representations were encouraged to be place-like, but free to vary in tuning and firing locations. By doing so, we could explore what upstream representations facilitate place-like representations, and even remapping (as it turned out) with minimal assumptions. However, we agree that our task does not capture some of the nuances of real-world navigation, such as sensory observations, which could be useful extensions in future work. Then again, the simplicity of our setup makes it easier to interpret the model, and makes it all the more surprising that it learns many behaviors exhibited by real world place cells.

      As to the distribution of phases - we also agree that a hexagonal arrangement likely reflects some optimal configuration for decoding of location.

      And we agree that the symmetry within the experimental data is important; we have revised analyses on experimental phase distributions, and included an analysis of ensemble grid score, to quantify any hexagonal symmetries within the data.

      Weaknesses:

      The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.

      Thank you for raising this point; we absolutely agree that the navigation task is somewhat niche. However, this was a conscious decision, to minimize any possible confounding from alternate input sources, such as observations. In part, this experimental design was inspired by the suggestion that grid cells support navigation/path integration in open-field environments with minimal sensory input (as they could, conceivably do so with no external input). This also pertains to your other point, that boundary interactions are necessary for navigation. In our model, using boundaries is one solution, but there is another way around this problem, which is conceivably better: to path integrate in an egocentric frame, starting from your initial position. Since the locations of place fields are inferred only after a trajectory has been traversed, the network is free to create a new or shifted representation every time, independently of the arena. In this case, one might have expected generalized solutions, such as grid cells to emerge. That this is not the case, seems to suggest that grid cells may somehow not be optimal for pure path integration, or at the very least, hard to learn (but may still play a part, as alluded to by place field locations). We have tried to make these points more evident in the revised manuscript.

      As for the point that the decoding may lead to place-like representations, this is a fair point. Indeed, we did choose this form of decoding, inspired by the localized firing of place cells, in the hope that it would encourage minimally constrained, place-like solutions. However, compared to other works (Sorscher and Xu) hand tuning the functional form of their place cells, our (although biased towards centralized tuning curves) allows for flexible functional forms such as the position of the place cell centers, their tuning width, whether or not it is center-surround activity, and how they should tune to different environments/rooms. This allows us to study several features of the place cell system, such as remapping and field formation. We have revised to make this more clear in the model description.

      The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.

      We agree that boundary interactions could facilitate the convergence of representations after noise injection. We did try to moderate this claim by the wording “attractor-like”, but we agree that boundaries could confound this result. We have therefore performed a modified noise injection experiment, where we let the network run for an extended period of time, before noise injection (and no velocity signal), see Appendix Velocity Ablation in the revised text. Notably, representations converge to their pre-scrambled state after noise injection, even without a velocity signal. However, place-like representations do not converge for all noise levels in this case, possibly indicating that boundary interactions do serve an error-correcting function, also. Thank you for pointing this out.

      As for the attractiveness of contexts, we agree that more analyses were required to demonstrate this. We have therefore conducted a supplementary analysis where we run the trained network with a mismatch in context/geometry, and demonstrate that the context signal fixes the representation, up to geometric distortions.

      The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.

      Thank you for pointing this out; we agree that this is not sufficiently explained and explored in the current version. We have therefore conducted a grid score analysis of the experimental place center distributions, to uncover possible hexagonal symmetries. The reason for choosing this particular animal was in part because it featured the largest number of included cells, while also demonstrating the most striking phase distribution, while including all distributions in the supplementary. Originally, this was only intended as a preliminary analysis, suggesting non-uniformity in experimental place field distributions, but we realize that these may all provide interesting insight into the distributional properties of place cells.

      We have explained these choices in the revised text, and expanded analyses on all animals to showcase these results more clearly.

      Reviewer #2 (Public Review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entorhinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.

      The suggestion about the remapping mechanism shows an interesting theoretical possibility.

      We thank the reviewer for their kind feedback.

      Weaknesses:

      The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.

      The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.

      We appreciate this point, and hope to clarify: From a purely architectural perspective, place-like representations are generated by linear combinations of recurrent unit representations, which, after training, appear border-like. During remapping, the network is simply evaluated/run in different geometries/contexts, which, it turns out, causes the network to exhibit different representations, likely as solutions to optimally encoding position in the different environments. We have attempted to revise the text to make some of these interpretations more clear. We have also conducted a supplementary analysis to demonstrate how representations are determined by the context signal directly, which helps to explain how recurrent and output units form their representations.

      We also agree that our model does not capture the full complexity of the Hippocampal formation. However, we would argue that its simplicity (focusing on a single cell type and a pure path integration task), acts as a useful baseline for studying the role of place cells during spatial navigation. The fact that our model captures a range of place cell behaviors (field formation, remapping and geometric deformation) without grid cells also point to several interesting possibilities, such that grid cells may not be strictly necessary for place cell formation and remapping, or that border cells may account for many of the peculiar behaviors of place cells. However, we wholeheartedly agree that including e.g. sensory information and memory storage/retrieval tasks would prove a very interesting extension of our model to more naturalistic tasks and settings. In fact, our framework could easily accommodate this, e.g. by decoding contexts/observations/memories from the network state, alongside location.

      Reviewer #3 (Public Review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking, and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Thank you for your insightful comments! Regarding the low path integration error, there is a slight statistical signal from the boundaries, as trajectories tend to turn away from arena boundaries. However, we agree, that studying path integration performance in the face of noise would make for a very interesting future development.

      Weaknesses:

      I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.

      First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.

      Thank you for pointing this out. This is an excellent point, that we agree could be addressed more rigorously. Note that there is no position encoding in our model; the initial state of the network is a vector of zeros, and the network must infer its location from boundary interactions and context information alone. So there is no way for positional information to leak through to the recurrent layer directly. However, one possible reason for the observed symmetry breaking, is the fact that the velocity input signal is aligned with the cardinal directions. To investigate this, we trained a new model, wherein input velocities are rotated 45 degrees relative to the horizontal, as you suggest. The results, shown and discussed in appendix E (Learned recurrent representations align with environment boundaries), do indicate that representations are tuned to environment boundaries, and not the cardinal directions, which hopefully improves upon this point.

      Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.

      This is a great point, and we agree. We do write that we perform this encoding to encourage minimally constrained place-like representations (to study their properties), but we have revised to make this more evident.

      However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the code, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?

      Thanks for raising this point! We agree that this finding is surprising, but we hold that it actually shows something quite important: that border-type units are sufficient to create place-like representations, and learns several of the behaviors associated with place cells and remapping (including global remapping and field stretching). In other words, a single cell type known to exist upstream of place cells is sufficient to explain a surprising range of phenomena, demonstrating that other cell types are not strictly necessary. However, we agree that understanding why the boundary type units sometimes rate remap, and whether that can be true for some border type cells in the brain (either directly, or through gating mechanisms) would be important future developments. Related to this point, we also expanded upon the influence of the context signal for representation selection (appendix F)

      Concerning the relationship to other models, we would argue that the simplicity of our model is one of its core strengths, making it possible to disentangle what different cell types are doing. While other models, including TEM, are highly important for understanding how different cell types and brain regions interact to solve complex problems, we believe there is a need for minimal, understandable models that allows us to investigate what each cell type is doing, and this is where we believe our work is important. As an example, our model not only highlights the sufficiency of boundary-type cells as generators of place cells, its lack of e.g. grid cells also suggest that grid cells may not be strictly necessary for e.g. open-field/sensory-deprived navigation, as is often claimed.

      One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!

      This is a good point, and you are correct: one cluster tends to correspond to one unit. To make this more clear, we have revised Fig. 7, so that each decoded center is shaded by unit identity, which makes this more evident. And yes, this is, seemingly in line with some form of optimal packing/encoding of space, yes!

      I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.

      Thanks for raising this point. We agree that we were not clear enough in our original manuscript. We included additional analyses in one animal, to showcase one preliminary case of non-uniform phases. To mitigate this, we have performed the same analyses for all animals, and included a longer discussion of these results (included in the supplementary material). We have also moderated the discussion on Ripley’s H to encompass only non-uniformity, and added a grid score analysis to showcase possible rotational symmetries in the data. We hope this gets our findings across more clearly

      Some smaller weaknesses:

      - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.

      Longer training time did not seem to affect representations. However, due to the long trajectories and statefulness involved, training was time-intensive and could become unstable for very long training. We therefore stopped training at the indicated time.

      - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?

      This is a good point; stability is not guaranteed. We have updated the text to reflect this.

      - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?

      We found that bias tended to have a detrimental effect on training, possibly related to the identity initialization used (see e.g. Le et al. 2015), and found that training improved when biases were fixed to zero.

      - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.

      We agree that other works also provide ways of formalizing this concepts. However, our goal by doing so was to elucidate common features across these seemingly disparate models. We also found that the concept of a learned and target map made it easier to come up with novel models, such as one wherein place cells are constructed to match a grid cell label.

      Aim Achieved? Impact/Utility/Context of Work

      Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.

      Thank you for this perspective; we agree that all of these are important works that arrive at complementary findings. We hold that the importance of our paper lies in its minimal nature, and its focus on place cells, via a purpose-built decoding that enables place-like representations. In doing so, we can point to possibly under explored relationships between cell types, in particular place cells and border cells, while challenging the necessity of other cell types for open-field navigation (i.e. grid cells). In addition, our work points to a novel connection between grid cells, place cells and even border cells, by way of the hexagonal arrangement of place unit centers. However, we agree that expanding our model to include more biologically plausible architectures and constraints would make for a very interesting extension in the future.

      Thank you again for your time, as well as insightful comments.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Even after reading Methods 5.3, I found it hard to understand how the ratemap population vectors that produce Fig 3e and Fig 5 are calculated. It's unclear to me how there can be a ratemap at a single timestep, because calculating a ratemap involves averaging the activity in each location, which would take a whole trajectory and not a single timestep. But I think I've understood from Methods 5.1 that instead the ratemap is calculated by running multiple 'simultaneous' trajectories, so that there are many visited locations at each timestep. That's a bit confusing because as far as I know it's not a common way to calculate ratemaps in rodent experiments (probably because it would be hard to repeat the same task 500 times, while the representations remain the same), so it might be worth explaining more in Methods 5.3.

      We understand the confusion, and have attempted to make this more clear in the revised manuscript. We did indeed create ratemaps over many trajectories for time-dependent plots, for the reasons you mentioned. We also agree that this would be difficult to do experimentally, but found it an interesting way to observe convergence of representations in our simulated scenario.

      Fig 3b-d shows multiple analyses to support output unit global remapping, but no analysis to support the claim that recurrent units remap by rate changes. The examples in Fig 3ai look pretty convincing, but it would be useful to also have a more quantitative result.

      We agree, and only showed that units turn off/become silent using ratemaps. We have therefore added an explicit analysis, showcasing rate remapping in recurrent units (see appendix G; Recurrent units rate remap)

      Reviewer #2 (Recommendations For The Authors):

      Some parts of the current manuscript are hard to follow. Particularly, the model description is not transparent enough. See below for the details.

      Major comments:

      (1) Mathematical models should be explained more explicitly and carefully. I had to guess or desperately search for the definitions of parameters. For instance, define the loss function L in eq.(1). Though I can assume L represents the least square error (in A.8), I could not find the definition in Model & Objective. N should also be defined explicitly in equation (3). Is this the number of output cells?

      Thank you for pointing this out, we have revised to make it more clear.

      (2) In Fig. 1d, how were the velocity and context inputs given to individual neurons in the network? The information may be described in the Methods, but I could not identify it.

      This was described in the methods section (Neural Network Architecture and Training), but we realize that we used confusing notation, when comparing with Fig. 1d. We have therefore changed the notation, and it should hopefully be clearer now. Thanks for pointing out this discrepancy.

      (3) I took a while to understand equations (3) and (4) (for instance, t is not defined here). The manuscript would be easier to read if equations (5) and (6) are explained in the main text but not on page 18 (indeed, these equations are just copies of equations 3 and 4). Otherwise, the authors may replace equations (3) and (4) with verbal explanations similar to figure legend for Fig. 1b.

      (4) Is there any experimental evidence for uniformly strong EC-to-CA1 projections assumed in the non-trainable decoder? This point should be briefly mentioned.

      Thank you for raising this point. The decoding from EC (the RNN) to CA1 (the output layer) consists of a trainable weight matrix, and may thus be non-uniform in magnitude. The non-trainable decoding acts on the resulting “CA1” representation only. We hope that improvements to the model description also makes this more evident.  

      (5) The explanation of Fig. 3 in the main text is difficult to follow because subpanels are explained in separate paragraphs, some of which are very short, as short as just a few lines.

      This presentation style makes it difficult to follow the logical relationships between the subpanels. This writing style is obeyed throughout the manuscript but is not popular in neuroscience.

      Thanks for pointing this out, we have revised to accommodate this.

      (6) Why do field centers cluster near boundaries? No underlying mechanisms are discussed in the manuscript.

      This is a good point; we have added a note on this; it likely reflects the border tuning of upstream units.

      (7) In Fig. 4, the authors presented how cognitive maps may vary when the shape and size of open arenas are modified. The results would be more interesting if the authors explained the remapping mechanism. For instance, on page 8, the authors mentioned that output units exhibit global remapping between contexts, whereas recurrent units mainly rate remapping.

      Why do such representational differences emerge?

      We agree! Thanks for raising this point. We have therefore expanded upon this discussion in section 2.4.

      (8) In the first paragraph of page 10, the authors stated ".. some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row)". I could not understand how Fig. 4d, middle row supports the argument. Similarly, they stated "..some output units reflect their main boundary input (with greater activity near one boundary)." I can neither understand what the authors mean to say nor which figures support the statement. Please clarify.

      This is a good point, there was an identifier missing; we have updated to refer to the correct “magnification”. Thanks!

      (9) The underlying mechanism of generating the hexagonal representation of output cells remains unclear. The decoder network uses a non-trainable decoding scheme based on localized firing patterns of output units. To what extent does the hexagonal representation depend on the particular decoding scheme? Similarly, how does the emergence of the hexagonal representation rely on the border representation in the upstream recurrent network? Showing several snapshots of the two place representations during learning may answer these questions.

      This is an interesting point, and we have added some discussion on this matter. In particular, we speculate whether it’s an optimal configuration for position reconstruction, which is demanded by the task and thus highly likely dependent on the decoding scheme. We have not reached a conclusive method to determine the explicit dependence of the hexagonal arrangement on the choice of decoding scheme. Still, it seems this would require comparison with other schemes. In our framework, this would require changing the fundamental operation of the model, which we leave as inspiration for future work. We have also added additional discussion concerning the relationship between place units, border units, and remapping in our model. As for exploring different training snapshots, the model is randomly initialized, which suggests that earlier training steps should tend to reveal unorganized/uninformative phase arrangements, as phases are learned as a way of optimizing position reconstruction. However, we do call for more analysis of experimental data to determine whether this is true in animals, which would strongly support this observation. We also hope that our work inspires other models studying the formation and remapping of place cells, which could serve as a starting point for answering this question in the future.

      (10) Figure 7 requires a title including the word "hexagonal" to make it easier to find the results demonstrating the hexagonal representations. In addition, please clarify which networks, p or g, gave the results shown here.

      We agree, and have added it!

      Minor comments:

      (11) In many paragraphs, conclusions appear near their ends. Stating the conclusion at the beginning of each paragraph whenever possible will improve the readability.

      We have made several rewrites to the manuscript, and hope this improves readability.

      (12) Figure A4 is important as it shows evidence of the CA1 spatial representation predicted by the model. However, I could not find where the figure is cited in the manuscript. The authors can consider showing this figure in the main text.

      We agree, and we have added more references to the experimental data analyses in the main text, as well as expanded this analysis.

      (13) The main text cites figures in the following format: "... rate mapping of Fig. 3a), i), boundary ...." The parentheses make reading difficult.

      We have removed the overly stringent use of double parentheses, thanks for letting us know.

      (14) It would be nice if the authors briefly explained the concept of Ripley's H function on page 14.

      Yes, we have added a brief descriptor.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Review 1:

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      We agree with the reviewers that the whole brain imaging approach is both a strength and a weakness. This manuscript and our previously published paper (Hotz et al., 2022) show indeed that the seizures have a initiation point and spread throughout the brain, interestingly affecting the telencephalon last. Localized seizure initiation was not the scope of this manuscript, however also here we would have to rely on imaging techniques. Using cell type specific drivers for specific neuronal subpopulation are an interesting approach, but outside of the scope of this study. An interesting approach would also include a more detailed analysis of glia in the context of epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      We also agree, that a more regional approach, after having more reliable information on the expression domains of the different galanin receptors, including more information on their respective role, is an important future research direction.

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We are in the process of preparing a manuscript describing a more detailed gene expression study of this and a chemically induced seizure model. Surprisingly we did not observe strong effects on glutamate receptor related genes. This does not preclude and indeed we deem it likely that additional factors play a role, e.g. other neuropeptides.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason to the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      Yes, we agree that galanin is likely not the only player. This warrants further investigations.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Review 2:

      Previous concerns about sex or developmental biological variables were addressed, as their model's seizure phenotype emerges rapidly and long prior to the establishment of zebrafish sexual maturity. However, in the course of re-review, some additional concerns (below) were detected that, if addressed, could further improve the manuscript. These concerns relate to how seizures were defined from the measurement of fluorescent calcium imaging data. Overall, this study is important and convincing, and carries clear value for understanding the multifaceted functions that neuronal galanin can perform under homeostatic and disease conditions.

      We are pleased that we could dispel the initial concerns.

      Additional Concerns:

      - The authors have validated their ability to measure behavioral seizures quantitatively in their 2022 Glia paper but the information provided on defining behavioral seizures was limited. The definition of behavioral seizure activity is not expanded upon in this paper, but could provide detail about how the behavioral seizures relate to a seizure detected via calcium imaging.

      In this paper we indeed do not address behavioral seizures but focus completely on neuronal seizures as defined in the material and methods section (“seizures were defined as calcium fluctuations reaching at least 100% of ΔF/F0 in the whole brain.”). Epileptic seizures in zebrafish, either evoked by pharmacological means or the result of genetic mutations, evoke stereotyped locomotor behavior in zebrafish as described in multiple publications (e.g. Baraban et al., 2005, Berghmans et al., 2007, Baxendale et al., 2012 and references therein).

      - Related to the previous point, for the calcium imaging, the difference between an increase in fluorescence that the authors think reflects increased neuronal activity and the fluorescence that corresponds to seizures is not very clear. This detail is necessary because exactly when the term "seizure" describes a degree of increased activity can be difficult to distinguish objectively.

      In our material and methods section, we describe our working definition of a seizure. Seizures are easily distinguished from increased activity by being synchronized.

      - The supplementary movies that were added were very useful, but raised some questions. For example, what brain regions were pulsating? What areas seemed to constantly exhibit strong fluorescence and was this an artifact? It seemed that sometimes there was background fluorescence in the body. Perhaps an anatomical diagram could be provided for the readers. In addition, there were some movies with much greater fluorescence changes - are these the seizures? These are some reasons for our request for clarified definitions of the term "seizure".

      The ”pulsating” (or “flickering”) brain activity is spontaneous neuronal activity. Some areas may appear to be more active, probably by a denser packing of neurons and intrinsically more spontaneous neuronal activity. However, since we only use normalized data, this does not affect our measurements.

      - While it is not critical to change, I will again note the possible confusion that the use of the word "sedative" in this context may cause. However, I do understand this is a stylistic choice.

      - Supplementary Figure 1B: the N values along the x-axis appear to have been duplicated and the duplications are offset and overlapping with one another by mistake.

      Thank you for pointing this out. We have corrected the figure accordingly.

      Review 3:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the revised manuscript still lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We agree that the mechanistic role of galanin still needs to be defined. The role is more complex that we expected, mainly due to its negative feedback properties. A complete mechanistic understanding will require a number of additional studies and is unfortunately outside of the scope of this manuscript.

      (2) The revised manuscript continues to heavily rely on calcium imaging of different mutant lines. Confirmation of knockouts has been provided with immunostaining in a new supplementary figure. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Cell recordings and biochemistry is challenging in the small larval zebrafish brain. We deem the genetic manipulations that we describe to be more informative than pharmacological experiments due to specificity issues.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank all the reviewers for their time and valuable feedback, which helped us improve our manuscript. Based on the comments, we have made several critical changes to the revised manuscript.

      (1) We have changed our threshold for detecting freezing epochs from 1 cm/s to 0 cm/s in this revised manuscript. This change allows us to capture periods when animals are completely still on the treadmill, better matching the "true freezing" behavior seen in freely moving set-ups. We have added a new supplementary video (Supplementary Video 2) that better demonstrates the freezing response we observe. All results and figures in the revised manuscript reflect this updated threshold (Figure 2-6, Supplementary Figures 16, Tables 1-6). Our main findings remain robust, demonstrating that freezing serves as a reliable conditioned response in our paradigms, comparable to freely moving animals. Specifically, freezing behavior increased reliably in the fear-conditioned environment following CFC across all paradigms. We have also added data from a no-shock control group (Supplementary Figure 2) which, when compared to the conditioned group, shows that freezing responses in the conditioned group result from fear conditioning rather than immobility. We do observe other avoidance behaviors unique to our treadmill-based task— such as hesitation, backward movement, and slow crawls. These conditioned behaviors are captured through a separate metric: the time taken to complete a lap.

      (2) As suggested by the reviewers, we have separately analyzed fear discrimination and extinction dynamics across recall days (Supplementary Figures 2, 5 and 6, Table 1-6). To assess fear discrimination, we use within-group comparisons to evaluate how well animals differentiate between the two VRs across days. For extinction, we use within-VR comparisons to examine freezing dynamics over time. Freezing across recall days is compared to baseline freezing (pre-conditioning) using a Linear Mixed Effects model (Tables 1-6), with recall days as fixed effects and mouse as a random effect, using baseline freezing as the reference.

      (3) We have expanded the behavioral dataset in Paradigm 1 to investigate the effect of shock amplitude on the conditioned fear response (Supplementary Figure 2 C-E). Consistent with findings in freely moving animals, our data show that increasing shock intensity from 0.6 mA to 1.0 mA leads to stronger freezing. For the revised manuscript, we specifically increased the sample size in the 0.6 mA group (n = 8) in Paradigm 1, as this intensity is used in Paradigm 3. These additional data demonstrate that combining a lower shock amplitude with shorter inter-shock intervals and retaining the tail-coat during recall can enhance freezing, suggesting that these parameters help compensate for lower shock intensity.

      (4) We have added more sample sizes to the imaging dataset (now n = 8, Figures 7-8).

      Finally, we acknowledge that many aspects of this paradigm still require optimization. The headfixed CFC paradigm is in its early stages compared to the decades of research dedicated to understanding fear learning parameters in freely moving CFC paradigms. While there are numerous parameters that could be tested—both those identified through our own discussions and those raised by the reviewers—it is not feasible for a single lab to conduct a full evaluation of all the possible factors that could influence CFC in the head-fixed prep. A key limitation is that our approach requires robust navigation behavior in the VR without rewards, which requires weeks of training per mouse. It also necessitates larger sample sizes at the outset as not all animals will make it through our behavioral criteria required for CFC. Another important consideration is scalability. Unlike freely moving CFC paradigms, which allow parallel testing of many animals with minimal pre-training, the VR-CFC setup requires several weeks of behavior training and involves a more complex integration of hardware and software to accurately track behavior in virtual space. The number of VR rigs that can be operated simultaneously in a single lab is often limited, making high-throughput testing more challenging. These factors mean that the testing of a single parameter in a group of animals requires approximately 3–4 months to complete. Despite these constraints, we are committed to continue refining this paradigm over time. With this manuscript, our main aim was to provide a detailed framework, initial parameters, and evidence for conditioned behavior in the head-fixed preparation. By doing so, we hope to facilitate the adoption of this paradigm by researchers interested in studying the neural correlates of learning and memory using multiphoton imaging and stimulation techniques. This approach enables investigations that are not possible in freely moving animals, while the presence of freezing as a conditioned response allows for direct comparisons to the extensive body of work done in freely moving paradigms. Moving forward, we anticipate that optimizing this paradigm and identifying the key parameters that drive learning will be a collaborative, community-led effort.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to develop a contextual fear learning (CFC) paradigm in head-fixed mice that would produce freezing as the conditioned response. Typically, lick suppression is the conditioned response in such designs, but this (1) introduces a potential confounding influence of reward learning on neural assessments of aversion learning and (2) does not easily allow comparison of head-fixed studies with extensive previous work in freely moving animals, which use freezing as the primary conditioned response.

      The first part of this study is a report on the development and outcomes of 3 variations of the CFC paradigm in a virtual reality environment. The fundamental design is strong, with headfixed mice required to run down a linear virtual track to obtain a water reward. Once trained, the water reward is no longer necessary and mice will navigate virtual reality environments. There are rigorous performance criteria to ensure that mice that make it to the experimental stage show very low levels of inactivity prior to fear conditioning. These criteria do result in only 40% of the mice making it to the experimental stage, but high rates of activity in the VR environment are crucial for detecting learning-related freezing. It is possible that further adjustments to the procedure could improve attrition rates.

      We acknowledge that further adjustments to the procedure could improve attrition rates, and we will continue to work on improving the paradigm.

      Paradigm versions 1 and 2 vary the familiarity of the control context while paradigm versions 2 and 3 vary the inter-shock interval. Paradigm version 1 is the most promising, showing the greatest increase in conditioned freezing (~40%) and good discrimination between contexts (delta ~15-20%). Paradigm version 2 showed no clear evidence of learning - average freezing at recall day 1 was not different than pre-shock freezing. First-lap freezing showed a difference, but this single-lap effect is not useful for many of the neural circuit questions for which this paradigm is meant to facilitate. Also, the claim that mice extinguished first-lap freezing after 1 day is weak. Extinction is determined here by the loss of context discrimination, but this was not strong to begin with. First-lap freezing does not appear to be different between Recall Day 1 and 2, but this analysis was not done.

      This is an important point. Following reviewer suggestions, we have replotted our figures for all paradigms to show within-VR freezing (see Supplementary Figures 2, 5 and 6) as the appropriate method for quantifying fear extinction across days. Using an LME model (Tables 16), we quantify freezing during recall days against baseline freezing levels measured before fear conditioning within each VR. In Paradigm 2, while some fear discrimination persists across days, extinction does occur rapidly. After the first lap in the CFC VR, we observed no significant differences in freezing compared to the baseline. These results are shown in the revised Supplementary Figure 5, and the revised text is in lines 393-399.

      Paradigm version 3 has some promise, but the magnitude of the context discrimination is modest (~10% difference in freezing). Thus, further optimization of the VR CFC will be needed to achieve robust learning and extinction. This could include factors not thoroughly tested in this study, including context pre-exposure timing and duration and shock intensity and frequency.

      We acknowledge that many aspects of this paradigm still need optimization, as virtual reality CFC is in its early stages, and we have not explored all of the parameter space. We describe above the reasoning for this. However, for this revised version of the paper we have added new behavioral data (Supplementary Figure 2 C-E) showing that increasing shock intensities from 0.6 mA to 1 mA enhances freezing, both in the first lap and on average. There are of course many other parameters that are likely important, like the ones pointed out here by the reviewer, but exploring the entire parameter space will take many years and will likely require many labs. The purpose of this paper is to show that VR-CFC fundamentally works and is a starting point from which the field can build on. We have now pointed out in the introduction (lines 54-58) and discussion (lines 730-737, 810-814) that there remains significant scope for improving this paradigm and optimizing parameters in the future.

      The second part of the study is a validation of the head-fixed CFC VR protocol through the demonstration that fear conditioning leads to the remapping of dorsal CA1 place fields, similar to that observed in freely moving subjects. The results support this aim and largely replicate previous findings in freely moving subjects. One difference from previous work of note is that VR CFC led to the remapping of the control environment, not just the conditioning context. The authors present several possible explanations for this lack of specificity to the shock context, further underscoring the need for further refinement of the CFC protocol before it can be widely applied. While this experiment examined place cell remapping after fear conditioning, it did not attempt to link neural activity to the learned association or freezing behavior.

      This is an interesting observation. We think that the remapping observed in the control context likely occurred due to the absence of reward in a previously rewarded environment. Our prior work has demonstrated that removal of reward causes increased remapping (Krishnan et al., 2022, Krishnan and Sheffield, 2023). In other words, the continued presence of reward within an environment stabilizes CA1 place fields. The Moita et al. (2004) paper, which showed remapping only in the fear conditioned context and not in the control context, provided rats with food pellets throughout the experimental session in both the control and conditioned context— likely to increase exploration necessary for identifying place cells. The presence of reward in the Moita et al experiment could explain the minimal remapping observed in their control context compared to our control context which lacked reward. Another possibility could lie in the differences in the intervals between place cell activity recordings in our study and that of Moita et al. While Moita et al. separated their recordings by just one hour, our recordings were separated by a full day, with a sleep period in between. The absence of sleep and the shorter time interval between conditioning and retrieval sessions in their study could explain the minimal remapping observed by Moita et al. compared to our findings. We have now addressed this discrepancy explicitly in lines 596-606.

      Although we agree with the reviewer that it would be informative to perform analysis of how neural activity correlates with freezing responses, we think this warrants its own stand-alone manuscript as the neural dynamics and methods to appropriately analyze them are complicated. We are in the midst of analyzing this data further and will present these findings in a separate publication.

      In summary, this is an important study that sets the initial parameters and neuronal validation needed to establish a head-fixed CFC paradigm that produces freezing behaviors. In the discussion, the authors note the limitations of this study, suggest the next steps in refinement, and point to several future directions using this protocol to significantly advance our understanding of the neural circuits of threat-related learning and behavior.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Krishnan et al devised three paradigms to perform contextual fear conditioning in head-fixed mice. Each of the paradigms relied on head-fixed mice running on a treadmill through virtual reality arenas. The authors tested the validity of three versions of the paradigms by using various parameters. As described below, I think there are several issues with the way the paradigms are designed and how the data are interpreted. Moreover, as Paradigm 3 was published previously in a study by the same group, it is unclear to me what this manuscript offers beyond the validations of parameters used for the previous publication. Below, I list my concerns point-by-point, which I believe need to be addressed to strengthen the manuscript.

      Major comments

      (1) In the analysis using the LME model (Tables 1 and 2), I am left wondering why the mice had increased freezing across recall days as well as increased generalization (increased freezing to the familiar context, where shock was never delivered). Would the authors expect freezing to decrease across recall days, since repeated exposure to the shock context should drive some extinction? This is complicated by the analysis showing that freeing was increased only on retrieval day 1 when analyzing data from the first lap only. Since reward (e.g., motivation to run) is removed during the conditioning and retrieval tests, I wonder if what the authors are observing is related to decreased motivation to perform the task (mice will just sit, immobile, not necessarily freezing per se). I think that these aspects need to be teased out.

      This is an important point and we agree teasing out a lack of motivation versus fearful freezing would be useful. To address the possibility that reduced motivation to run without reward could contribute to the observed freezing behavior, we have now included a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I). These control mice experienced the same protocol, including the wearing of a tail coat, but did not receive any shocks. We observed no increases in freezing across days in these controls, confirming that the increased freezing in the Familiar context of our experimental group stems from fear conditioning rather than the removal of reward from a previously rewarded context. If reduced motivation from reward removal were the primary driver, similar freezing patterns would have emerged in the no-shock controls. We have added lines 248-261 in the revised manuscript, discussing this point, and we thank the reviewer for motivating us to do this experiment and analysis.

      That said, the precise mechanisms underlying the fear generalization observed in the nonconditioned context—particularly its emergence during later recall days—remain unclear. Studies in freely moving animals have shown that fear memories initially specific to the conditioned context can become generalized with repeated exposures, which may be occurring here (Biedenkapp & Rudy, 2007; Wiltgen & Silva, 2007). Alternatively, it is possible that the combination of fear conditioning and the removal of expected reward contributes to a delayed generalization effect. This may reflect a limitation of our approach, which relies on reward to motivate initial training. As noted by another reviewer, we have now addressed this potential drawback of reward-based training in the discussion (see lines 809-817). Clearly, unique factors specific to the head-fixed VR paradigm may contribute to this phenomenon. Understanding the mechanisms underlying fear generalization in the head-fixed VR CFC paradigm will be a valuable direction for future research.

      (2) Related to point 1, the authors actually point out that these changes could be due to the loss of the water reward. So, in line 304, is it appropriate to call this freezing? I think it will be very important for the authors to exactly define and delineate what they consider as freezing in this task, versus mice just simply sitting around, immobile, and taking a break from performing the task when they realize there is no reward at the end.

      As noted in point 1 above, we have added a no-shock control group (n = 7; Supplementary Figure 2A-B, H–I) to determine whether the observed freezing was driven by fear conditioning or by reduced motivation to run in the absence of reward. The absence of increased freezing in these controls supports the interpretation that the behavior in the conditioned group is fearrelated. In future studies, incorporating additional physiological measures—such as heart rate monitoring—could further help distinguish fear-related freezing from other forms of immobility.

      (3) In the second paradigm, mice are exposed to both novel and (at the time before conditioning) neutral environments just before fear conditioning. There is a big chance that the mice are 'linking' the memories (Cai et al 2016) of the two contexts such that there is no difference in freezing in the shock context compared to the neutral context, which is what the authors observe (Lines 333-335). The experiment should be repeated such that exposure to the contexts does not occur on the conditioning day.

      This is an interesting idea. However, if memory linking were driving the observed freezing patterns, we would expect to see similarly reduced fear discrimination across all three paradigms, as mice experience both contexts sequentially in each case. However, this effect appears to be specific to Paradigm 2, suggesting this may be due to other factors. We agree it would be informative to eliminate pre-conditioning exposure to both environments—to assess whether this improves fear discrimination and helps clarify the potential contribution of memory linking. This is something we plan to do in future studies that are beyond the scope of this initial paper on VR-CFC.

      (4) On lines 360-361, the authors conclude that extinction happens rapidly, within the first lap of the VR trial. To my understanding, that would mean that extinction would happen within the first 5-10 seconds of the test (according to Figure S1E). That seems far too fast for extinction to occur, as this never occurs in freely behaving mice this quickly.

      We agree with the reviewer that extinction in Paradigm 2 appears to occur relatively rapidly.

      However, the average time to complete the first lap in the fear-conditioned context in Paradigm 2 is 25.68 ± 5.55 seconds (as stated in line 384), indicating that extinction occurs within approximately the first 30 seconds of context exposure—not within 5–10 seconds. This is specific to Paradigm 2 and does not happen in either of the other paradigms, as shown in Supplementary Figure 4. For clarification, Figure S1E pertains to baseline running in Paradigm 1 and does not apply to Paradigm 2.

      As the reviewer points out, even at 30 seconds, extinction seems to be happening more quickly in Paradigm 2 than seen in freely moving setups. This may be due to a key structural difference in our setup. The VR-CFC task is organized into discrete trials, with mice being teleported back to the start after reaching the end of the virtual track. Completing a full lap without receiving a shock could serve as a clear signal that the threat is no longer present within the environment as the completion of a lap means that the animals have surveyed all locations within the environment. This structure could accelerate extinction compared to freely moving setups, where animals take longer to explore their complete environment due to the lack of discrete trials. Although this is true for all our paradigms, the accelerated extinction seen in paradigm 2 versus 1 and 3 may be driven by other factors. As noted by the reviewers, other task parameters—such as context pre-exposure timing, shock intensity, and conditioning duration— are likely to play a role in shaping extinction dynamics. These factors warrant further investigation, and we plan to explore them in future studies to better understand the conditions influencing extinction in the VR-CFC paradigm.

      (5) Throughout the different paradigms, the authors are using different shock intensities. This can lead to differences in fear memory encoding as well as in levels of fear memory generalization. I don't think that comparisons can be made across the different paradigms as too many variables (including shock intensity - 0.5/0.6mA can be very different from 1.0 mA) are different. How can the authors pinpoint which works best? Indeed, they find Paradigm 3 'works' better than Paradigm 2 because mice discriminate better between the neutral and shock contexts. This can definitely be driven by decreased generalization from using a 0.6mA shock in Paradigm 3 compared to 1.0 mA shock in Paradigm 2.

      The reviewer brings up important points here. We have now added new data evaluating 0.6 mA shocks in Paradigm 1 (Supplementary Figure 2A–E, n=8). These data show that 1.0 mA shocks produced stronger conditioned responses and greater fear discrimination compared to 0.6 mA. Our goal in Paradigm 3 was to begin with a lower shock intensity and assess whether additional modifications—specifically the shorter ISI and retention of the tail-coat during recall—could enhance fear conditioning. Surprisingly, despite the weaker shock intensity, Paradigm 3 resulted in improved discrimination and freezing behavior relative to Paradigm 2. We have now clarified this point in the manuscript (lines 466-470), and we interpret this outcome as evidence that the shorter ISIs and contextual cue continuity (tail-coat) likely play a more significant role in enhancing learning and recall. However, as noted in the text (lines 511-514), further testing is needed to determine the individual contributions of each parameter to successful VR-CFC. Fully optimizing the parameter settings will take additional time and resources, and we aim to continually refine the parameter space in the future, as has been done over the years for freely moving animals.

      (6) There are some differences in the calcium imaging dataset compared to other studies, and the authors should perform additional testing to determine why. This will be integral to validating their head-fixed paradigm(s) and showing they are useful for modeling circuit dynamics/behaviors observed in freely behaving mice. Moreover, the sample size (number of mice) seems low.

      The one notable difference between our imaging study and that done in freely moving animals is that we observed remapping of place cells in the control context. In contrast, Moita et al. (2004) reported more stable place fields in the control context. A key distinction is that their study included rewards in the control context, which may have contributed to the spatial stability. We now discuss this difference in the manuscript (lines 599-605).

      It should be noted that there are many key distinctions among paradigms that study neural activity during fear conditioning in freely moving animals. These include varying exposure times to environments (1–6 days), the time interval between neural activity recordings, and the use of food rewards during the experiment stages in freely moving animals to encourage exploration for place cell identification. Although freely moving paradigms that investigate fear conditioning and place cells are heterogeneous, we were encouraged by the replication of several key findings. This validates VR-based CFC as a viable tool for neural circuit investigations. While future work will include more thorough analyses, our current findings demonstrate the paradigm's effectiveness for modeling circuit dynamics and behavior. We have now expanded our dataset, which includes four additional mice, further corroborating these original findings.

      (7) It appears that the authors have already published a paper using Paradigm 3 (Ratigan et al 2023). If they already found a paradigm that is published and works, it is unclear to me what the current manuscript offers beyond that initial manuscript.

      The reviewer is correct that we have published a paper using Paradigm 3. However, this manuscript goes beyond that one and provides a much more comprehensive description and fundamental analysis of the behavior and experimental parameters regarding VR-CFC, allowing the research community to adapt our paradigm reproducibly. While Ratigan et al. (2023) offered only a minimal description of behavior and included just Paradigm 3, we present two additional paradigms along with neuronal validation using hippocampal place cells. We have now explicitly stated this in the introduction (lines 50-55).

      (8) As written, the manuscript is really difficult to follow with the averages and standard error reported throughout the text. This reporting in the text occurred heterogeneously throughout the text, as sometimes it was reported and other times it was not. Cleaning this reporting up throughout the paper would greatly improve the flow of the text and qualitative description of the results.

      We completely agree with this point and have now cleaned up the text, leaving details only in a few places we felt were important.

      Reviewer #3 (Public review):

      Summary:

      Krishnan et al. present a novel contextual fear conditioning (CFC) paradigm using a virtual reality (VR) apparatus to evaluate whether conditioned context-induced freezing can be elicited in head-fixed mice. By combining this approach with two-photon imaging, the authors aim to provide high-resolution insights into the neural mechanisms underlying learning, memory, and fear. Their experiments demonstrate that head-fixed mice can discriminate between threat and non-threat contexts, exhibit fear-related behavior in VR, and show context-dependent variability during extinction. Supplemental analyses further explore alternative behaviors and the influence of experimental parameters, while hippocampal neuron remapping is tracked throughout the experiments, showcasing the paradigm's potential for studying memory formation and extinction processes.

      Strengths:

      Methodological Innovation: The integration of a VR-based CFC paradigm with real-time twophoton imaging offers a powerful, high-resolution tool for investigating the neural circuits underlying fear, learning, and memory.

      Versatility and Utility: The paradigm provides a controlled and reproducible environment for studying contextual fear learning, addressing challenges associated with freely moving paradigms.

      Potential for Broader Applications: By demonstrating hippocampal neuron remapping during fear learning and extinction, the study highlights the paradigm's utility for exploring memory dynamics, providing a strong foundation for future studies in behavioral neuroscience.

      Comprehensive Data Presentation: The inclusion of supplemental figures and behavioral analyses (e.g., licking behaviors and variability in extinction) strengthens the manuscript by addressing additional dimensions of the experimental outcomes.

      Weaknesses:

      Characterization of Freezing Behavior: The evidence supporting freezing behavior as the primary defensive response in VR is unclear. Supplementary videos suggest the observed behaviors may include avoidance-like actions (e.g., backing away or stopping locomotion) rather than true freezing. Additional physiological measurements, such as EMG or heart rate, are necessary to substantiate the claim that freezing is elicited in the paradigm.

      To strengthen our claim that freezing is a conditioned response in this task, we have taken three key steps:

      (1) We adjusted our freezing detection threshold from 1 cm/s to near 0 cm/s to capture only periods where the animal is virtually motionless on the treadmill. We validated this approach in Figure 2, particularly in the zoomed-in track position trace in Figure 2A, which clearly shows that the identified freezing epochs correspond to no change in track position. All analyses and figures have been updated to reflect this more stringent threshold.

      (2) We have added a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I) where mice experienced the same protocol, including wearing a tail-coat, but received no shocks. These mice showed no increases in freezing behavior, which further demonstrates that the increased freezing we observe is a result of fear conditioning.

      (3) We have added a new supplementary video (Supplementary Video 2) that better illustrates the freezing behavior in our task.

      That said, we fully agree with the reviewer that freezing is not the only defensive response observed. Other behaviors—such as hesitation, backward movement, and slowing down—also emerge that are unique to our treadmill-based paradigm. We chose to focus on freezing in this manuscript to align with convention in freely moving fear conditioning studies and to facilitate direct comparisons. We agree that additional physiological measurements (e.g., EMG or heart rate) would provide further validation and could help distinguish between different forms of defensive responses. We view this as an important future direction and plan to incorporate such measures in upcoming studies. We highlight this in the results section (lines 175-179, 262-268) and in the discussion (lines 739-750).

      Analysis of Extinction: Extinction dynamics are only analyzed through between-group comparisons within each Recall day, without addressing within-group changes in behavior across days. Statistical comparisons within groups would provide a more robust demonstration of extinction processes.

      This is an important distinction and we have now added figures (Supplementary Figures 2H-I, 5C-D, 6C-D) showing within-VR behavior across Recall days, along with statistical comparisons and a description of the extinction process based on these results.

      Low Sample Sizes: Paradigm 1 includes conditions with very low sample sizes (N=1-3), limiting the reliability of statistical comparisons regarding the effects of shock number and intensity.

      Increasing sample sizes or excluding data from mice that do not match the conditions used in Paradigms 2 and 3 would improve the rigor of the analysis.

      While we included all conditions in Figure 2 for completeness, we have separated these conditions in Supplementary Figure 2 to ensure clarity. This allows researchers interested in this paradigm to see the approximate range of conditioned responses observed across different parameters. When comparing Paradigm 1 with Paradigms 2 and 3, we have only used data from 1mA, 6 shocks condition.

      Potential Confound of Water Reward: The authors critique the use of reward in conjunction with fear conditioning in prior studies but do not fully address the potential confound introduced by using water reward during the training phase in their own paradigm.

      We agree this is a point that needs discussion. We have now noted the limitation of using water rewards during training in the discussion section, particularly its effect on the animal’s motivation in the long term and on place cell activity (lines 814-820).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      I suggest changing "3 paradigms" to "3 versions of a CFC paradigm," as the paradigm is fundamentally the same, but parameters were adjusted towards finding an optimal protocol.

      We have changed this phrasing where applicable.

      Figure S2: There appear to be different sets of shock parameters for different mice, most with an n of 1 or 2. This is not reliable for making a decision for optimal shock parameters and should not be discussed in that way until a full-powered comparison is completed. Also, the N adds up to 19, yet only 18 are described as being included in the study.

      We thank the reviewer for this important point. We agree that the current study is not powered to definitively identify optimal parameter settings. We have been careful not to interpret it in that way in the text. Rather, we adopted a commonly used starting point from the freely moving literature—1 mA with six shocks—as our initial condition (lines 196-199). To provide context for others interested in pursuing this work, we have presented a range of conditioned responses from different parameter combinations to illustrate potential variability. In most cases, these data are intended for illustrative purposes only and are not meant to support firm conclusions. We agree that a systematic and fully powered investigation of each parameter would be highly valuable, and we plan to pursue this in future work (and hope other labs contribute to this goal, too), much like the iterative optimizations performed in freely moving paradigms over time.

      We thank the reviewer for catching the sample size discrepancy and have now corrected it.

      The number of animals for the no-shock condition should be included.

      Thank you. We have now included this.

      A possible explanation for the lower fear and poorer discrimination in versions 2 and 3 could be that 10 min pre-exposure to the CFC context on day -1 led to latent inhibition. Shorter (or eliminated) pre-exposure may improve outcomes.

      We agree that the exposure time is a parameter that we should explore. We have highlighted this in the discussion (lines 729-736) as a parameter that is worth testing in the future.

      For analysis of extinction, it is best to establish this within condition - is freezing to the CFC context significantly reduced compared with initial recall and similar to pre-training freezing? By using discrimination as your index of extinction, increases in control context freezing/inactivity can eliminate context discrimination without the conditioned response of freezing actually undergoing extinction.

      This is a good point, and we have now included analysis and conclusions based on a within-VR comparison for the analysis of fear extinction (Supplementary Figures 2H-I, 5C-D, 6C-D).

      Reviewer #3 (Recommendations for the authors):

      Clarification of Treadmill Shape: The manuscript describes the treadmill as "spherical" throughout. However, based on representative images and videos, the treadmill appears cylindrical. This discrepancy should be clarified to ensure consistency between the text and visuals.

      The reviewer is correct that the treadmill is cylindrical, and this was an error on our part. We have corrected it throughout.

      Figure and Legend Labeling: To improve clarity, all figures and their legends should be explicitly labeled with the corresponding paradigm (1, 2, or 3) to facilitate interpretation.

      We have now added a label on all figures that clarifies which Paradigm the figures are referring to. We have also explicitly added this to the figure legends.

      Objective Language: Subjective language, such as "since we wanted animals to" (Line 850), should be revised to reflect an objective tone (e.g., "to allow animals to"). Similarly, phrases like "We believe" (Line 896) should be avoided to maintain an unbiased presentation.

      We have removed subjective language from our text.

      Placement of Future Directions: Speculations on future experimental plans, such as the use of sex as a biological variable (Lines 895-903), should be included in the Discussion section rather than the Methods. Additionally, remarks about the responsiveness of female mice to tail shocks should be moved to the main text for proper contextualization.

      We have moved these lines as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthen the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail, and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      The authors have made some changes in the revised version. However, many of the changes were superficial, and some concerns still need to be addressed. Important details are still missing from the description of some experiments. Authors should carefully revise the manuscript to ascertain that all details that could affect interpretation of their results are presented clearly. For instance, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned, etc? These are important details, which need to be included. Even in cases where additional information was included, the authors did not discuss how the specific way in which certain experiments were performed could affect interpretation of their results. One example is the potential for compound carryover in their experiments. Another important one is the fact that CAPE affects bacterial growth and sporulation. Therefore, it is critical that authors acknowledge that they cannot discard the possibility that other factors besides compound interactions with the toxin are involved in their phenotypes. As stated previously, authors should also be careful when drawing conclusions from the analysis of microbiota composition data, and changes to the manuscript should be made to reflect this. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Again, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #2 (Public review):

      I appreciate the author's responses to my original review. This is a comprehensive analysis of CAPE on C. difficile activity. It seems like this compound affects all aspects of C. difficile, which could make it effective during infection but also make it difficult to understand the mechanism. Even considering the authors responses, I think it is critical for the authors to work on the conclusions regarding the infection model. There is some protection from disease by CAPE but some parameters are not substantially changed. For instance, weight loss is not significantly different in the C. difficile only group versus the C. difficile + CAPE group. Histology analysis still shows a substantial amount of pathology in the C. difficile + CAPE group. This should be discussed more thoroughly using precise language.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      Results are really good, and the CAPE shows a good and promising alternative for treating CDI.

      Weaknesses:

      Some references are too old or missing.

      Comments on revisions:

      I have read your study after comments made by all referees, and I noticed that all questions and suggestions addressed to the authors were answered and well explained. Some of the minor and major issues related to the article were also solved. I am satisfied with all the effort given by the authors to improve their manuscript.

      Thanks again for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The legend of Figure 3SB is incorrect. It should read "Growth curves of C. difficile BAA-1870 in the presence of varying concentrations of CAPE (0-64 µg/mL)". Also, there is something wrong with the symbols in this figure. I suspect what is happening is that the symbols for the concentrations of 32 and 64 µg/mL are superimposing, but this is a problem because the lower line looks like a closed circle, which is supposed to represent the condition where no CAPE was added. The authors should change the symbols to allow clear distinction between each of the conditions.

      Thanks for your constructive suggestion. We have modified the panel and figure legend in Figure 3SB. The concentrations of 32 μg/mL and 64 μg/mL are quite similar, which makes it challenging to differentiate between the corresponding data points on the graph. To enhance clarity, we have utilized distinct colors to help distinguish these closely valued lines as effectively as possible.

      Since the authors observed a significant effect of CAPE on both bacterial growth and spore production, their discussion and conclusions need to reflect the fact that the effects observed can no longer be attributed solely to toxin inhibition.

      Thanks for your comments. We have modified the corresponding description according to your suggestions.

      In lines 43-45, authors state that "CAPE treatment of C. difficile-challenged mice induces a remarkable increase in the diversity and composition of the gut microbiota (e.g., Bacteroides spp.)". It is still unclear to this reviewer why mention Bacteroides between parentheses. Does this mean that there was an increase in the abundance of Bacteroides? If that is the case this needs to be stated more clearly.

      Thanks for your comments. Treatment with CAPE indeed significantly increased the abundance of Bacteroides spp. in the gut microbiota (Figure 7H-J). However, to avoid ambiguity in the abstract, we have chosen to delete the specific mention of Bacteroides spp. within the parentheses.

      The modifications made to lines 132-135 still do not address my concern. Authors stated in the manuscript that "compounds that were not bound to TcdB were removed". But how was this done? This needs to be clearly explained in the manuscript. In the response to reviewers document, authors state that this was done through centrifugation. But given that the goal here is to separate excess of small molecule from a protein target, just stating that centrifugation was used is not enough. Did the authors use ultracentrifugation? What were the conditions employed. This is critical so that the reader can assess the degree of compound carryover that may have occurred. Also, authors need to clearly acknowledge the caveats of their experimental design by stating that they cannot rule out the contribution of compound carryover to their results.

      Thanks for your comments. We employed ultrafiltration centrifugal partition to remove the unbound small molecule compounds. Due to the large molecular weight of TcdB, approximately 270 kDa, we selected a 100 kDa molecular weight cutoff ultrafiltration membrane. The centrifugation was performed at 4000 g for 5 min to eliminate the compounds that did not bind to TcdB. We have incorporated the relevant methods and discussed the potential impacts on the respective sections of the manuscript.

      In line 142, authors added the molar concentration of caffeic acid, as requested. Although this helps, it is even more important that molar concentrations are added every time a compound concentration is mentioned. For instance, just 2 lines down there is another mention of a compound concentration. It would be informative if authors also added molar concentrations here and throughout the manuscript.

      Thanks for your comments. In our initial test design, we have utilized the concentration unit of μg/mL. However, during the conversion to μM using the dilution method, some values do not result in neat, whole numbers. For instance, the conversion of 32 μg/mL of caffeic acid phenyl ethyl ester yields 112.55 μM, which appears somewhat irregular when expressed in this manner.

      Line 277. For the sake of clarity, I would strongly suggest that authors use the term "control mice" instead of "model mice".

      Thanks for your comments. We have modified “model mice” to “control mice” throughout the manuscript.

      In line 302, the word taxa should not be capitalized. I capitalized it in my original comments simply to draw attention to it.

      Thanks for your comments. We have modified this word.

      In the section starting in line 318, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned? Etc, etc. These are important details, which need to be included.

      Thanks for your comments. We have added some metabolomics methods in the corresponding section.

      In line 338, the authors misunderstood my original comment. This sentence should read "...the final product of purine degradation, were markedly decreased in mice after...".

      Thanks for your comments. We have modified this sentence.

      Panels of figure 3 are still incorrectly labeled. The secondary structure predictions are shown in A and C, not A and B as is currently stated in the legend.

      Thanks for your comments. We have modified the figure legend in Figure 3.

      About Figure 5C, I think the authors for the clarification, but this explanation should be included in the figure legend.

      Thanks for your comments. We have added the relevant information to the figure legend.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility, and clarity

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized. The authors then study different aspects of Neisseria-endothelial interactions and benchmark the bacterial infection model against the best disease model available, a human skin xenograft mouse model, which is one of the great strengths of the paper. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and induces endothelial cytoskeleton rearrangements. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria. The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field, and I only have a few major comments and some minor.

      Major comments:

      Infection-on-chip. I would recommend the authors to change the terminology of "infection on chip" to better reflect their work. The term is vague and it decreases novelty, as there are multiple infection on chips models that recapitulate other infections (recently reviewed in https://doi.org/10.1038/s41564-024-01645-6) including Ebola, SARS-CoV-2, Plasmodium and Candida. Maybe the term "sepsis on chip" would be more specific and exemplify better the work and novelty. Also, I would suggest that the authors carefully take a look at the text and consider when they use VoC or to current term IoC, as of now sometimes they are used interchangeably, with VoC being used occasionally in bacteria perfused experiments.

      We thank Reviewer #1 for this suggestion. Indeed, we have chosen to replace the term "Infection-on-Chip" by "infected Vessel-on-chip" to avoid any confusion in the title and the text. Also, we have removed all the terms "IoC" which referred to "Infection-on-Chip" and replaced with "VoC" for "Vessel-on-Chip". We think these terms will improve the clarity of the main text.

      Fig 3 and Suppmentary 3: Permeability. The authors suggest that early 3h infection with Neisseria do not show increase in vascular permeability in the animal model, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. This seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research [1] has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection and these experiments were not carried out. As discussed in the manuscript, bacteria induced permeability in mouse occurs at later time points, 16h post infection as shown previoulsy [2]. As discussed in the manuscript, this difference between the xenograft model and the chip likely reflect the absence in the chip of various cell types present in the tissue parenchyma.

      The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65\% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      We thank the Reviewer #1 for this suggestion. - According to this recommendation, we imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. We have added the graph and the confocal images in Figure S4B and lines 418-419 of the revised manuscript. - We recently performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Fig. 1 of this response), it was not as obvious as other markers under these infected conditions and we did not include it in the main text. Interpretation of this result is not straight forward as for instance the substrate of the cells is different and it would require further studies on the behaviour of ERM proteins in these different contexts.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. I would suggest that the authors add a more extensive description of the protocol in methods. Could this technique be applied in other laboratories? If this is a major limitation, it should be listed in the discussion.

      Following the Reviewer's comment, we introduced more detailed explanations regarding the photoablation: - L157-163 (Results): "Briefly, the chosen design is digitalized into a list of positions to ablate. A pulsed UV-LASER beam is injected into the microscope and shaped to cover the back aperture of the objective. The laser is then focused on each position that needs ablation. After introducing endothelial cells (HUVEC) in the carved regions,.." - L512-516 (Discussion): "The speed capabilities drastically improve with the pulsing repetition rate. Given that our laser source emits pulses at 10kHz, as compared to other photoablation lasers with repetitions around 100 Hz, our solution could potentially gain a factor of 100. Also,..." - L1082-1087 (Materials and Methods): "…, and imported in a python code. The control of the various elements is embedded and checked for this specific set of hardware. The code is available upon request."

      Adding these three paragraphs gives more details on how photoablation works thus improving the manuscript.

      Minor comments:

      Supplementary Fig 2. The reference to subpanels H and I is swapped.

      The references to subpanels H and I have been correctly swapped back in the reviewed version.

      Line 203: I would suggest to delete this sentence. Although a strength of the submitted paper is the direct comparison of the VoC model with the animal model to better replicate Neisseria infection, a direct comparison with animal permeability is not needed in all vascular engineering papers, as vascular permeability measurements in animals have been well established in the past.

      The sentence "While previously developed VoC platforms aimed at replicating physiological permeability properties, they often lack direct comparisons with in vivo values." has been removed from the revised text.

      Fig 3: Bacteria binding experiments. I would suggest the addition of more methodological information in the main results text to guarantee a good interpretation of the experiment. First, it would be better that wall shear stress rather than flow rate is described in the main text, as flow rate is dependent on the geometry of the vessel being used. Second, how long was the perfusion of Neisseria in the binding experiment performed to quantify colony doubling or elongation? As per figure 1C, I would guess than 100 min, but it would be better if this information is directly given to the readers.

      We thank Reviewer #1 for these two suggestions that will improve the text clarity (e.g., L316). (i) Indeed, we have changed the flow rate in terms of shear stress. (ii) Also, we have normalized the quantification of the colony doubling time according to the first time-point where a single bacteria is attached to the vessel wall. Thus, early adhesion bacteria will be defined by a longer curve while late adhesion bacteria by a shorter curve. In total, the experiment lasted for 3 hours (modifications appear in L318 and L321-326).}

      Fig 4: The honeycomb structure is not visible in the 3D rendering of panel D. I would recommend to show the actin staining in the absence of Neisseria staining as well.

      According to this suggestion, a zoom of the 3D rendering of the cortical plaque without colony had been added to the figure 4 of the revised manuscript.

      Line 421: E-selectin is referred as CD62E in this sentence. I would suggest to use the same terminology everywhere.

      We have replaced the "CD62E" term with "E-selectin" to improve clarity.}

      Line 508: "This difference is most likely associated with the presence of other cell types in the in vivo tissues and the onset of intravascular coagulation". Do the authors refer to the presence of perivascular cells, pericytes or fibroblasts? If so, it could be good to mention them, as well as those future iterations of the model could include the presence of these cell types.

      By "other cell types", we refer to pericytes [3], fibroblasts [4], and perivascular macrophages [5], which surround endothelial cells and contribute to vessel stability. The main text was modified to include this information (Lines 548 and 555-570) and their potential roles during infection disussed.

      Discussion: The discussion covers very well the advantages of the model over in vitro 2D endothelial models and the animal xenograft but fails to include limitations. This would include the choice of HUVEC cells, an umbilical vein cell line to study microcirculation, the lack of perivascular cells or limitations on the fabrication technique regarding application in other labs (if any).

      We thank Reviewer #1 for this suggestion. Indeed, our manuscript may lack explaining limitations, and adding them to the text will help improve it: - The perspectives of our model include introducing perivascular cells surrounding the vessel and fibroblasts into the collagen gel as discussed previously and added in the discussion part (L555-570). - Our choice for HUVEC cells focused on recapitulating the characteristics of venules that respect key features such as the overexpression of CD62E and adhesion of neutrophils during inflammation. Using microvascular endothelial cells originating from different tissues would be very interesting. This possibility is now mentioned in the discussion lines 567-568. - Photoablation is a homemade fabrication technique that can be implemented in any lab harboring an epifluorescence microscope. This method has been more detailed in the revised manuscript (L1085-1087).

      Line 576: The authors state that the model could be applied to other systemic infections but failed to mention that some infections have already been modelled in 3D bioengineered vascular models (examples found in https://doi.org/10.1038/s41564-024-01645-6). This includes a capillary photoablated vascular model to study malaria (DOI: 10.1126/sciadv.aay724).

      Thes two important references have been introduced in the main text (L84, 647, 648).}

      Line 1213: Are the 6M neutrophil solution in 10ul under flow. Also, I would suggest to rewrite this sentence in the following line "After, the flow has been then added to the system at 0.7-1 μl/min."

      We now specified that neutrophils are circulated in the chip under flow conditions, lines 1321-1322.

      Significance

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Its main limitations is the brief description of the photoablation methodology and more clarity is needed in the description of bacteria perfusion experiments, given their complexity. The manuscript will be of interest for the general infection community and to the tissue engineering community if more details on fabrication methods are included. My expertise is on infection bioengineered models.

      Reviewer #2

      Evidence, reproducibility, and clarity

      Summary The authors develop a Vessel-on-Chip model, which has geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells and flow perfusion to induce mechanical cues. This vessel could be infected with Neisseria meningitidis, as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse, which is the current gold standard for these studies, and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      Major comments:

      I have no major comments. The claims and the conclusions are supported by the data, the methods are properly presented and the data is analyzed adequately. Furthermore, I would like to propose an optional experiment could improve the manuscript. In the discussion it is stated that the vascular geometry might contribute to bacterial colonization in areas of lower velocity. It would be interesting to recapitulate this experimentally. It is of course optional but it would be of great interest, since this is something that can only be proven in the organ-on-chip (where flow speed can be tuned) and not as much in animal models. Besides, it would increase impact, demonstrating the superiority of the chip in this area rather than proving to be equal to current models.

      We have conducted additional experiments on infection in different vascular geometries now added these results figure 3/S3 and lines 288-305. We compared sheared stress levels as determined by Comsol simulation and experimentally determined bacterial adhesion sites. In the conditions used, the range of shear generated by the tested geometries do not appear to change the efficiency of bacterial adhesion. These results are consistent with a previous study from our group which show that in this range of shear stresses the effect on adhesion is limited [6] . Furthermore, qualitative observations in the animal model indicate that bacteria do not have an obvious preference in terms of binding site.

      Minor comments:

      I have a series of suggestions which, in my opinion, would improve the discussion. They are further elaborated in the following section, in the context of the limitations.

      • How to recapitulate the vessels in the context of a specific organ or tissue? If the pathogen is often found in the luminal space of other organs after disseminating from the blood, how can this process be recapitulated with this mode, if at all?

      • For reasons that are not fully understood, postmortem histological studies reveal bacteria only inside blood vessels but rarely if ever in the organ parenchyma. The presence of intravascular bacteria could nevertheless alter cells in the tissue parenchyma. The notable exception is the brain where bacteria exit the bacterial lumen to access the cerebrospinal fluid. The chip we describe is fully adapted to develop a blood brain barrier model and more specific organ environments. This implies the addition of more cell types in the hydrogel. A paragraph on this topic has been added (Lines 548 and 552-570).

      • Similarly, could other immune responses related to systemic infection be recapitulated? The authors could discuss the potential of including other immune cells that might be found in the interstitial space, for example.

      • This important discussion point has been added to the manuscript (L623-636). As suggested by Reviewer #2, other immune cells respond to N. meningitis and can be explored using our model. For instance, macrophages and dendritic cells are activated upon N. meningitis infection, eliminate the bacteria through phagocytosis, produce pro-inflammatory cytokines and chemokines potentially activating lymphocytes [7]. Such an immune response, yet complex, would be interesting to study in our model as skin-xenograft mice are deprived of B and T lymphocytes to ensure acceptance of human skin grafts.

      • A minor correction: in line 467 it should probably be "aspects" instead of "aspect", and the authors could consider rephrasing that sentence slightly for increased clarity.

      • We have corrected the sentence with "we demonstrated that our VoC strongly replicates key aspects of the in vivo human skin xenograft mouse model, the gold standard for studying meningococcal disease under physiological conditions." in lines 499-503.

        Strengths and limitations

      The most important strength of this manuscript is the technology they developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date, but allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. However, the authors do not seem to present any new mechanistic insights obtained using this model. All the findings obtained in the infection-on-chip demonstrate that the model is equivalent to the human skin xenograft mouse model, and can offer superior resolution for microscopy. However, the advantages of the model do not seem to be exploited to obtain more insights on the pathogenicity mechanisms of N. meningitidis, host-pathogen interactions or potential applications in the discovery of potential treatments. For example, experiments to elucidate the role of certain N. meningiditis genes on infection could enrich the manuscript and prove the superiority of the model. However, I understand these experiments are time-consuming and out of the scope of the current manuscript. In addition, the model lacks the multicellularity that characterizes other similar models. The authors mention that the pathogen can be found in the luminal space of several organs, however, this luminal space has not been recapitulated in the model. Even though this would be a new project, it would be interesting that the authors hypothesize about the possibilities of combining this model with other organ models. The inclusion of circulating neutrophils is a great asset; however it would also be interesting to hypothesize about how to recapitulate other immune responses related to systemic infection.

      We thank Reviewer #2 for his/her comment on the strengths and limitations of our work. The difficulty is that our study opens many futur research directions and applications and we hope that the work serves as the basis for many future studies but one can only address a limited set of experiments in a single manuscript. - Experiments investigating the role of N. meningitidis genes require significant optimization of the system. Multiplexing is a potential avenue for future development, which would allow the testing of many mutants. The fast photoablation approach is particularly amenable to such adaptation. - Cells and bacteria inside the chambers could be isolated and analyzed at the transcriptomic level or by flow cytometry. This would imply optimizing a protocol for collecting cells from the device via collagenase digestion, for instance. This type of approach would also benefit from multiplexing to enhance the number of cells. - As mentioned above, the revised manuscript discusses the multicellular capabilities of our model, including the integration of additional immune cells and potential connections to other organ systems. We believe that these approaches are feasible and valuable for studying various aspects of N. meningitidis infection.

      Advance

      The most important advance of this manuscript is technical: the development of a model that proves to be equivalent to the most complex model used to date to study meningococcal systemic infections. The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Infection-on-chip model is completely in vitro, can be produced quickly, and allows to precisely tune the vessel's geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area.

      Other vessel-on-chip models can recapitulate an endothelial barrier in a tube-like morphology, but do not recapitulate other complex geometries, that are more physiologically relevant and could impact infection (in addition to other non-infectious diseases). However, in the manuscript it is not clear whether the different morphologies are necessary to study or recapitulate N. meningitidis infection, or if the tubular morphologies achieved in other similar models would suffice.

      We thank Reviewer #2 for his/her comment, also raised by reviewer 1. To answer this question, we have now infected vessel-on-chips of different geometries, to dissect the impact of flow distribution in N. meningitidis infection (Figures 3 and S3, explained in lines 288-307). In this range of shear stress, we show that bacterial infection is not strongly affected by geometry-induced shear stress variation. These observations are constistent with observations in flow chambers and qualitative observations of human cases and in the xenograft model [6].

      Audience

      This manuscript might be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. Thus, the tool presented (vessel-on-chip) can have great applications for a broad audience. However, even when the method might be faster and easier to use than other equivalent methods, it could still be difficult to implement in another laboratory, especially if it lacks expertise in bioengineering. Therefore, the method could be more of interest for laboratories with expertise in bioengineering looking to expand or optimize their toolbox. Alternatively, this paper present itself as an opportunity to begin collaborations, since the model could be used to test other pathogen or conditions.

      Field of expertise: Infection biology, organ-on-chip, fungal pathogens.

      I lack the expertise to evaluate the image-based analysis.

      References:

      1. Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      2. Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi:10.1038/s41467-021-24797-z.

      3. Mats Hellström, Holger Gerhardt, Mattias Kalén, Xuri Li, Ulf Eriksson, Hartwig Wolburg, and Christer Betsholtz. Lack of pericytes leads to endothelial hyperplasia and abnormal vascular morphogenesis. Journal of Cell Biology, 153(3):543–554, Apr 2001. ISSN 0021-9525. doi: 10.1083/jcb.153.3.543.

      4. Arsheen M. Rajan, Roger C. Ma, Katrinka M. Kocha, Dan J. Zhang, and Peng Huang. Dual function of perivascular fibroblasts in vascular stabilization in zebrafish. PLOS Genetics, 16(10):1–31, 10 2020. doi: 10.1371/journal.pgen.1008800.

      5. Huanhuan He, Julia J. Mack, Esra Güç, Carmen M. Warren, Mario Leonardo Squadrito, Witold W. Kilarski, Caroline Baer, Ryan D. Freshman, Austin I. McDonald, Safiyyah Ziyad, Melody A. Swartz, Michele De Palma, and M. Luisa Iruela-Arispe. Perivascular macrophages limit permeability. Arteriosclerosis, Thrombosis, and Vascular Biology, 36(11):2203–2212, 2016. doi: 10.1161/ATVBAHA. 116.307592.

      6. Emilie Mairey, Auguste Genovesio, Emmanuel Donnadieu, Christine Bernard, Francis Jaubert, Elisabeth Pinard, Jacques Seylaz, Jean-Christophe Olivo-Marin, Xavier Nassif, and Guillaume Dumenil. Cerebral microcirculation shear stress levels determine Neisseria meningitidis attachment sites along the blood–brain barrier . Journal of Experimental Medicine, 203(8):1939–1950, 07 2006. ISSN 0022-1007. doi: 10.1084/jem.20060482.

      7. Riya Joshi and Sunil D. Saroj. Survival and evasion of neisseria meningitidis from macrophages. Medicine in Microecology, 17:100087, 2023. ISSN 2590-0978. doi: https://doi.org/10.1016/j.medmic.2023.100087.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      (1) “It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.”

      We agree with the reviewer that metabolic changes may differ ex vivo versus in vivo. We now state: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (2) “The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.”

      We have clarified that the metabolic changes may be in RPCs or in other retinal cell types on lines 149-152: “Since these measurements were performed in bulk, and the ratio of RPCs to differentiated cells declines as development proceeds, it is not clear whether glycolytic activity is temporally regulated within RPCs or in other retinal cell types.”

      However, since we mined a single cell (sc) RNA-seq dataset, we are able to attribute gene expression specifically within RPCs (Figure 1).

      (3) “The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.”

      We have added the information and references brought up by the reviewer in our discussion (lines 529-549 and 570-574). We have also suggested future experiments to further analyse our system in line with the studies now referenced (lines 580-589).

      (4) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      We have expanded the list of glycolytic genes analysed, in modified Figure 1B, and expanded the description of these results on lines 156-166.

      (5) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We added a comment to this effect to the discussion: “It is possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation, which we could assess in the future.“ (lines 600-603).

      (6) “Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).”

      We have added the acetate data to main Figure 7E.

      We added a supplemental data table that was inadvertently not included in our last submission. Figure 2– Data supplement 1.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Assuming that increased glycolysis gets RPCs to exit from the proliferative stage earlier, the total number of retinal cells, notably that of the rod photoreceptors, should be reduced since the pool of proliferating cells is depleted earlier. Is that really the case for a mature retina? To address this question, the authors should perform quantifications of photoreceptors at a stage where most developmental cell death has concluded (i.e. at P14 or later; Young, J. Comp. Neurol. 229:362-373, 1984) and check whether or not there are more or less photoreceptors present.

      We have previously quantified numbers of each cell type in Pten RPC-cKO retinas, and as suggested by the reviewer, there are fewer rod photoreceptors at P7 (Tachibana et al. 2016. J Neurosci 36 (36) 9454-9471) and P21 (Hanna et al. 2025. IOVS. Mar 3;66(3):45). We have edited the following sentence: “Using cellular birthdating, we previously showed that Pten-cKO RPCs are hyperproliferative and differentiate on an accelerated schedule between E12.5 and E18.5, yet fewer rod photoreceptors are ultimately present in P7 (Tachibana et al., 2016) and P21 (Hanna et al., 2025) retinas, suggestive of a developmental defect. (lines 184-187).

      (2) Figure 1B, 1H: On what data are these two figures based? The plots suggest that a high-density time series of gene expression and rod photoreceptor birth was performed, yet it is not clear where and how this was done. The authors should provide the data, plot individual data points, and, if applicable perform a statistical analysis to support their idea that glycolytic gene expression (as a surrogate for glycolysis) overlaps in time with rod photoreceptor birth (Figure 1B) and that in Pten KO the glycolytic gene expression is shifted forward in time (Figure 1H). If the data required to construct these plots (min. 5 data points, min 3 repeats each) does not exist or cannot be generated (e.g. from reanalysis of previously published datasets), then these graphs should be removed.

      We have removed the previous Figure 1B and Figure 1H.

      (3) Figure 2E: Which PKM isozyme was analyzed here? Does the genetic analysis allow us to distinguish between PKM1 and PKM2? Since PKM governs the key rate-limiting step of glycolysis but was not significantly upregulated, does this not contradict the authors' main hypothesis? If PKM at some point was inhibited (see also below comment to Figure 5) one would expect an accumulation of glycolytic intermediates, including phosphoenolpyruvate. Was such an effect observed?

      The data in Figure 2E is bulk RNA-seq data. Since there is only a single Pkm gene that is alternatively spliced, the RNA-sequencing data cannot distinguish between the four PK isozymes that arise from alternative splicing. Specifically, we used Illumina NextSeq 500 for sequencing of 75bp Single-End reads that will sequence transcripts for alternatively spliced Pkm1 and Pkm2 mRNAs, which carry a common 3’end. We added a statement to this effect: “However, since we employed 75 bp single-end sequencing, we could not distinguish between alternatively spliced Pkm1 and Pkm2 mRNAs.“ (lines 215-216).

      We have not performed metabolic analyses of glycolytic intermediates, but we have proposed such a strategy as an important avenue of investigation for future studies in the Discussion: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (4) Figure 3 and materials & methods: For the retinal explant cultures, was the RPE included in the cultured explants? If so, how can the authors distinguish drug effects on neuroretina and RPE? If the RPE was not included, then the authors should discuss how the missing RPE - neuroretina interaction could have influenced their results.

      We remove the RPE from the retinal explants, as indicated in the Methods section. The RPE is a metabolic hub that allows transport of nutrients for the retina, so in the absence of the RPE, there is not an immediate source of energy, such as glucose, to the retina. However, the media (DMEM) contains 25 mM glucose to replace the RPE as an energy source, and we now show that RPCs express GLUT1, which allows uptake of glucose (see new Figure 3A).

      We added the following sentence “P0 explants were mounted on Nucleopore membranes and cultured on top of retinal explant media, providing a source of nutrients, growth factors and glucose. “(lines 241-243).

      (5) Figure 3: It seems rather odd that, if glycolysis was so important for retinal proliferation, differentiation, and metabolism in general, the inhibition of glycolysis with 2DG should not produce a strong degeneration. However, since 2DG competes with glucose, and must be used at nearly equimolar concentration to block glycolysis in a meaningful way, it is possible that the 2DG concentration used simply was not high enough to substantially inhibit glycolysis. Since the inhibitory effect of 2DG depends on the glucose concentration, the authors should measure and provide the concentration of glucose in the explant culture medium. This value should be given either in results or materials and methods.

      We recently published a manuscript showing that 2DG treatments at the same concentrations employed in this study are effective at reducing lactate production in the developing retina in vivo, which is the expected effect of reduced glycolysis (Hanna et al. 2025. IOVS). However, in this study, we did not observe an impact on cell survival.

      We do not agree that it is necessary to measure glucose in the media since the anti-proliferative effect of 2DG is well known, and we are working in the effective range established by multiple groups. We have clarified that we are in the effective range by adding the following sentences: “2DG is typically used in the range of 5-10 mM in cell culture studies and in general, has anti-proliferative effects. To test whether 2DG treatment was in the effective range, explants were exposed to BrdU, which is incorporated into S-phase cells, for 30 minutes prior to harvesting. 2DG treatment resulted in a dose-dependent inhibition of RPC proliferation as evidenced by a reduction in BrdU<sup>+</sup> cells (Figure 3D), indicating that our treatment was in the effective range.” (lines 246-251).

      (6) Figure 3F: The authors use immunostaining for cleaved, activated caspase-3 to assess the amount of apoptotic cell death. However, there are many different possible mechanisms for neuronal cells to die, the majority of which are caspase-independent. To assess the amount of cell death occurring, the authors should perform a TUNEL assay (which labels apoptotic and non-apoptotic forms of cell death; Grasl-Kraupp et al., Hepatology 21:1465-8, 1995), quantify the numbers of TUNEL-positive cells in the retina, and compare this to the numbers of cells positive for activated caspase-3.

      We agree with the reviewer that there are more ways for a cell to die than just apoptosis, and TUNEL would pick up dying cells that may undergo apoptosis or necrosis, for example, our data with cleaved caspase-3, an executioner protease for apoptosis, provides us with clear evidence of cell death in our different conditions. Since this manuscript is not focused on cell death pathways, we have not performed the additional TUNEL assay.

      (7) Figure 4F and 4I: At post-natal day P7 the rod outer segments (OSs) only just start to grow out and the characteristic, rhodopsin-filled disk stacks are not yet formed. To test whether the PFKB3 gain-of function or the Pten KO has a marked effect on OS formation and length, the authors should perform the same tests on older, more mature retina at a time when rod OS show their characteristic disk structures (e.g. somewhere between P14 to P30). The same applies to the 2DG inhibition on the Pten KO retina.

      The precocious differentiation of rod outer segments observed in P7 Pten-cKO retinas does not persist in adulthood, and instead reflects a developmental acceleration. Indeed, we found that in Pten cKO retinas at 3-, 6- and 12-months of age, rod and cone photoreceptors degenerate, and cone outer segments are shorter (Hanna et al., 2025; Tachibana et al., 2016). These data demonstrate that Pten is required to support rod and cone survival.

      (8) Figure 5: Lowering media pH is a rather coarse and untargeted intervention that will have multiple metabolic consequences independent of PKM2. It is thus hardly possible to attribute the effects of pH manipulation to any specific enzyme. To assess this and possibly confirm the results obtained with low pH, the authors should perform a targeted inhibition experiment, for instance using Shikonin (Chen et al., Oncogene 30:4297-306, 2011), to selectively inhibit PKM2. If the retinal explant cultures contained the RPE, an additional question would be how the changes in RPE would alter lactate flux and metabolization between RPE and neuroretina (see also question 4 above).

      We have reframed the rationale for the pH manipulation experiments, highlighting the importance of pH in cell fate specification, and indicating that the aggregation of PKM2 is only one possible effect of lower pH.

      We wrote: “Given that altered glycolysis influences intracellular pH, which in turn controls cell fate decisions, we set out to assess the impact of manipulating pH on cell fate selection in the retina. One of the expected impacts of lowering pH was the aggregation of PKM2, a rate-limiting enzyme for glycolysis, which aggregates in reversible, inactive amyloids (Cereghetti et al., 2024).” (lines 362-366). 

      We have also added a discussion point “Whether pH manipulations also impact the stability of other retinal proteins, such as PKM2, can be further investigated in the future using specific PKM2 inhibitors, such as Shikonin (Chen et al., 2011). (lines 545-547).

      (9) Figure 5G: As for Figure 3F, the authors should perform TUNEL assays to assess the number of cells dying independent of caspase-3.

      Please see response to point 6.

      (10) Figure 7E: In the figure legend "K" should read "E". From the figure and the legend, it is not clear to which cell type this diagram should refer. This must be specified. Importantly, the insulin-dependent glucose-transporter 4 (GLUT4) highlighted in Figure 7E, while expressed on inner retinal vasculature endothelial cells, is not expressed in retinal neurons. What GLUTs exactly are expressed in what retinal neurons may still be to some extent contentious (cf. Chen et al., elife, https://doi.org/10.7554/eLife.91141.3 ; and reviewer comments therein), yet RPE cells clearly express GLUT1, photoreceptors likely express GLUT3, Müller glia cells may express GLUT1, while horizontal cells likely express GLUT2 (Yang et al., J Neurochem. 160:283-296, 2022).’

      We have removed this summary schematic for simplicity.

      (11) Materials and methods: The retinal explant culture system must be described in more detail. Important questions concern the use of medium and serum for which the providers, order numbers, and batch/lot numbers (whichever is applicable) must be given. The glucose concentration in the medium (including the serum content) should be measured. A key concern is whether the explants were cultivated submerged into the medium - this would prevent sufficient oxygenation and drive metabolism towards glycolysis (i.e. the Pasteur effect) - or whether they were cultivated on top of the liquid medium, at the interface between air and liquid (i.e. a situation that would favor OXPHOS).

      We have added further detail to the methods section for the explant assay (lines 686-689). We cultured the retinal explants on membranes on top of the media, which is the standard methodology in the field and in our laboratory (Cantrup et al., 2012; Tachibana et al., 2016; Touahri et al., 2024). Typically, RPCs undergo aerobic glycolysis, meaning that even in the presence of oxygen, they still prefer glycolysis rather than OXPHOS. We demonstrated that 2DG blocks RPC proliferation when treated with 2DG, indicating that RPCs are indeed favoring glycolysis in our assay system.

      (12) A point the authors may want to discuss additionally is the potential relevance of their data for the pathogenesis of human diseases, especially early developmental defects such as they occur in oxygen-induced retinopathy of prematurity.

      We would like to thank the reviewer for their valuable comment. Given that retinopathy of prematurity (ROP) is primarily vascular in nature, and we have not investigated vascular defects in this study, we have elected not to add a discussion of ROP to our manuscript.

      Minor points

      (1) Please add a label indicating the ages of the retina to images showing the entire retina (i.e. "P7"; e.g. in Figures 1F, 3, 4D, 5, etc.).

      Figure 1:

      1D: E18.5 indicated at the bottom of the two panels

      1F – P0 is indicated at the bottom of the two panels.

      Figure 3C-H: P0 explant stage and days of culture indicated

      Figure 4D: E12.5 BrdU and P7 harvest date indicated

      Figure 5C-H: P0 explant stage and days of culture indicated

      Figure 7A-E: P0 explant stage and days of culture indicated

      (2) The term Ctnnb1 should be introduced also in the abstract.

      We now state that Ctnnb1 encodes for b-catenin in the abstract.

      (3) Line 249: "...remaining..." should probably read "...remained...".

      Changed (now line 260).

      (4) Line 381: The sentence "...correlating with the propensity of some RPCs to continue to proliferate while others to differentiate.", should probably be rewritten to something like "...correlating with the propensity of some RPCs to continue to proliferate while others differentiate.".

      We have corrected this sentence.

      (5) The structure of the discussion might benefit from the introduction of subheadings.

      We have introduced subheadings.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1H shows the kinetics of rod photoreceptor production as accelerated, but does not represent the fact that fewer rods are ultimately produced, which appears to be the case from the data. If so, the Pten cKO curve should probably be lower than WT to reflect that difference.

      We have removed this graph (as per Reviewer #2, point 2).

      (2) KEGG analysis also showed that the HIF-1 signaling pathway is altered in the Pten cKO retina. What is the significance of that, and is it related to metabolic dysregulation? It has been shown that lactate can promote vessel growth, which initiates at birth in the mouse retina.

      We have added some information on HIF-1 to the Discussion. “The increased glycolytic gene expression in Pten-cKO retinas is likely tied to the increased expression of hypoxia-induced-factor-1-alpha (Hif1a), a known target of mTOR signaling that transcriptionally activates Slc1a3 (GLUT1) and glycolytic genes (Hanna et al., 2022). Indeed, mTOR signaling is hyperactive in Pten-cKO retinas (Cantrup et al., 2012; Tachibana et al., 2016; Tachibana et al., 2018; Touahri et al., 2024), and likewise, in Tsc1-cKO retinas, which also increase glycolysis via HIF-1A (Lim et al., 2021).” (lines 489-494).

      Cantrup, R., Dixit, R., Palmesino, E., Bonfield, S., Shaker, T., Tachibana, N., Zinyk, D., Dalesman, S., Yamakawa, K., Stell, W. K., Wong, R. O., Reese, B. E., Kania, A., Sauve, Y., & Schuurmans, C. (2012). Cell-type specific roles for PTEN in establishing a functional retinal architecture. PLoS One, 7(3), e32795. https://doi.org/10.1371/journal.pone.0032795

      Cereghetti, G., Kissling, V. M., Koch, L. M., Arm, A., Schmidt, C. C., Thüringer, Y., Zamboni, N., Afanasyev, P., Linsenmeier, M., Eichmann, C., Kroschwald, S., Zhou, J., Cao, Y., Pfizenmaier, D. M., Wiegand, T., Cadalbert, R., Gupta, G., Boehringer, D., Knowles, T. P. J., Mezzenga, R., Arosio, P., Riek, R., & Peter, M. (2024). An evolutionarily conserved mechanism controls reversible amyloids of pyruvate kinase via pH-sensing regions. Dev Cell. https://doi.org/10.1016/j.devcel.2024.04.018

      Chen, J., Xie, J., Jiang, Z., Wang, B., Wang, Y., & Hu, X. (2011). Shikonin and its analogs inhibit cancer cell glycolysis by targeting tumor pyruvate kinase-M2. Oncogene, 30(42), 4297-4306. https://doi.org/10.1038/onc.2011.137

      Hanna, J., Touahri, Y., Pak, A., David, L. A., van Oosten, E., Dixit, R., Vecchio, L. M., Mehta, D. N., Minamisono, R., Aubert, I., & Schuurmans, C. (2025). Pten Loss Triggers Progressive Photoreceptor Degeneration in an mTORC1-Independent Manner. Invest Ophthalmol Vis Sci, 66(3), 45. https://doi.org/10.1167/iovs.66.3.45

      Tachibana, N., Cantrup, R., Dixit, R., Touahri, Y., Kaushik, G., Zinyk, D., Daftarian, N., Biernaskie, J., McFarlane, S., & Schuurmans, C. (2016). Pten Regulates Retinal Amacrine Cell Number by Modulating Akt, Tgfbeta, and Erk Signaling. J Neurosci, 36(36), 9454-9471. https://doi.org/10.1523/JNEUROSCI.0936-16.2016

      Touahri, Y., Hanna, J., Tachibana, N., Okawa, S., Liu, H., David, L. A., Olender, T., Vasan, L., Pak, A., Mehta, D. N., Chinchalongporn, V., Balakrishnan, A., Cantrup, R., Dixit, R., Mattar, P., Saleh, F., Ilnytskyy, Y., Murshed, M., Mains, P. E., Kovalchuk, I., Lefebvre, J. L., Leong, H. S., Cayouette, M., Wang, C., Sol, A. D., Brand, M., Reese, B. E., & Schuurmans, C. (2024). Pten regulates endocytic trafficking of cell adhesion and Wnt signaling molecules to pattern the retina. Cell Rep, 43(4), 114005. https://doi.org/10.1016/j.celrep.2024.114005

    1. (cis-normativity, or the assumption that all people have a gender identity that is consistent with the sex they were assigned at birth) that has been built into the scanner, through the combination of user interface (UI) design.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jonathan Calzada, scanning technology, binary-gendered body-shape data constructs, and risk detection algorithms, as well as the socialization, training, and experience of the TSA agen

      I agree with what this part are talking about the design is exclude specific groups. But think about from the designer side, it's hard to include every user group, determine gender identity consistent with the sex they were assigned at birth might be hard for management. But yeah, i agree design should try to be as inclusive as possible, although the reality may be it's hard to design for everyone, we should still try to reach the goal.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewer for their constructive comments and the fair and interesting discussion between reviewers.

      __Reviewer #1 __

      We are delighted to read that the reviewer finds the manuscript “very clear and of immediate impact […] and ready for publication” regarding this aspect. We have toned down the conclusion, proposing rather than concluding that “the incapacitation of Cmg2[KO] intestinal stem cells to function properly […] is due to their inability to transduce Wnt signals”.

      We have addressed the 3 points that were raised as well as the minor comments.

      Point #1

      The mouse mutant is just described as 'KO', referring to the previous work by the authors. The cited work simply states that this is a zygotic deletion of exon 3, which somehow leads to a decrease in protein abundance that is almost total in the lung but not so clear in the uterus. Exon 3 happens to be 72 bp long [https://www.ncbi.nlm.nih.gov/nuccore/NM_133738], so its deletion (assuming there are no cryptic splicing sites used) leads to an internal in-frame deletion of 24 amino acids. So, at best, this 'KO' is not a null, but a hypomorphic allele of context-dependent strength.

      Unfortunately, neither the previous work nor this paper (unless I have missed it!) contains information provided about the expression levels of Cmg2 in the intestine of KO mice - nor which cell types usually express it (see below). I think that using anti Cmg2 in WB and immunohistofluorescence of with ISC markers with intestine homogenate/sections of wild-type and mutant mice would be necessary to set the stage for the rest of the work.

      We now provide and explanation and characterization the Cmg2KO mice. Exon 3 indeed only encodes a short 24 amino acid sequence. This exon however encodes a ß-strand that is central to the vWA domain of CMG2, and therefore critical for the folding of this domain. As now shown in Fig. S1c, CMG2Dexon3 is produced in cells but cleared by the ER associated degradation pathway, therefore it is only detectable in cells treated with the proteasome inhibitor MG132, at a slightly lower molecular weight than the full-length protein. This is consistent, and was inspired by the fact that multiple Hyaline Fibromatosis missense mutations that map to the vWA domain lead to defective folding of CMG2, further illustrating that this domain is very vulnerable to modifications. In Fig. S1c, we moreover now show immunoprecipitation of Cmg2 from colonic tissue of wild-type (WT) and knockout (KO) mice, which confirm the absence of Cmg2 protein in Cmg2KO samples.

      Point #2

      Connected to the previous point, the expression pattern of Cmg2 in the intestine is not described. Maybe this is already established in the literature, but the authors do not refer to the data. This is important when considering that the previous work of the authors suggests that Cmg2 might contribute to Wnt signalling transduction through physical, cis interactions with the Wnt co-receptor LRP6. Therefore, one would expect that Cmg2 would be cell-autonomously required in the intestinal stem cells.

      The expression pattern of Cmg2 in the gut has not been characterized and is indeed essential to understanding its function. To address this gap, we now added a figure (Fig. 1) providing data from publicly available RNA-seq datasets and from our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      In the RESULTS section we now mention:

      To investigate Cmg2 expression in the gut, we first analyzed publicly available spatial and scRNA-seq datasets to identify which cell types express Cmg2 across different gut regions. Spatial transcriptomic data from the mouse small intestine and colon revealed that Cmg2 is broadly expressed throughout the gut, including in the muscular, crypt, and epithelial layers (Fig. 1A–C). To validate these findings, we performed RNAscope in situ hybridization targeting Cmg2 in the duodenum and colon of wild-type mice. The expression pattern observed was consistent with the spatial transcriptomics data (Fig. 1D–E). We then analyzed scRNA-seq data from the same dataset to assess cell-type-specific expression in the mouse colon. Cmg2 was detected at varying levels across multiple cell types, including enterocytes and intestinal stem cells, as well as mesenchymal cells, notably fibroblasts.

      Of note for the reviewer, not mentioned in the manuscript, this wide-spread distribution of Cmg2 across the different cell types is not true for all organs. We have recently investigated the expression of Cmg2 in muscle and found that it is almost exclusively expressed in fibroblasts (so-called fibro-adipocyte progenitors) and very little in any other muscle cells, in particular fibers.

      Interestingly also, as now mentioned in the manuscript and shown in Fig. S1,the ANTXR1 protein, which is highly homologous to Cmg2 at the protein level and share its function of anthrax toxin receptor, displayed a much more restricted expression pattern, being confined primarily to fibroblasts and mural cells, and notably absent from epithelial cells. This differential expression highlights a potentially unique and epithelial-specific role for Cmg2 in maintaining intestinal homeostasis.

      Point #3

      The authors establish that the regenerating crypts of Cmg2[KO] mice are unable to transduce Wnt signalling, but it is not clear whether this situation is provoked by the DSS-induce injury or existed all along. Can Cmg2[KO] intestinal stem cells transduce Wnt signalling before the DSS challenge? If they were, it might suggest that the 'context-dependence' of the Cmg2 role in Wnt signalling is contextual not only because of the tissue, but because of the history of the tissue or its present structure. It would also suggest that Cmg2 mutant mice, unless reared in a germ-free facility for life, would eventually lose intestinal homeostasis, and maybe suggest the level of intervention/monitoring that HFS patients would require. It might also provide an explanation in case Cmg2 was not expressed in ISCs - if the state of the tissue was as important as the presence of the protein, then the effect on Wnt transduction could be indirect and therefore it might not be required cell-autonomously.

      We agree that understanding whether Cmg2KO intestinal stem cells are intrinsically unable to transduce Wnt signals, or whether this defect is contextually induced following injury (such as DSS treatment), is a critical point.

      As a first line of evidence, we show than under homeostatic condition, Wnt signaling appears largely intact in Cmg2KO crypts, with comparable levels of ß-catenin and expression levels of canonical Wnt target genes (e.g., Axin2, Lgr5) to those observed in WT animals (Figs. S1j-l and S3d-e). This indicates that Cmg2 is not essential for basal Wnt signaling under steady-state conditions.

      These findings thus support the idea that the requirement for Cmg2 in Wnt signal transduction is context-dependent—not only at the tissue level but also temporally, being specifically required during regenerative processes or in altered microenvironments such as during inflammation or epithelial damage. This context-dependence may reflect changes in the composition or accessibility of Wnt ligands, receptors, or matrix components during repair, where Cmg2 could play a scaffolding or stabilizing role.

      These aspects are now discussed in the text.

      I think points 1 and 2 are absolutely fundamental in a reverse genetics investigation. Point 3 would be nice to know but the outcome would not change the tenet of the paper. I believe that the work needed to deal these points can be performed on archival material. I do not think the mechanism proposed can be taken from 'plausible' to 'proven' without proposing substantial additional investigation, so I will not suggest any of it, as it could well be another paper.

      We have addressed points 1 and 2, and provided evidence and discussion for Point 3.

      __Minor points __

      1- Figure 1 legend says "In (c), results are mean {plus minus} SEM" - this seems applicable to (d) as (c) does not show error whiskers.

      We thank the reviewer for picking up this error. We modified : “In (c), results are median” and “In (d, f and g) Results are mean ± SEM.”

      2- Figure 1 legend says "(d) Body weight loss, (f) the aspect of the feces and presence of occult blood were monitored and used for the (e) DAI. Results are mean {plus minus} SEM. Each dot represents the mean of n = 12 mice per genotype". This part looks like has suffered some rearrangement of words. The first instance of (f) should be (e), I guess, and I am not sure what "(e) DAI" means. And for (e), "mean {plus minus} SEM" does not seem applicable. This needs some light revision.

      The legend was clarified as followed : “(d) __Body weight loss, and (e) aspect of the feces and presence of occult blood were monitored and used to evaluate Disease activity index in (f).__

      3 - Figure 1H legend does not say which statistical test was made in the survival experiment in (h) - presumably log-rank? A further comment on the survival statistics: euthanised animals should not be counted towards true mortality when that is what is recorded as an 'event'. They should be right-censored. However, in this case, reaching the euthanasia criterion is just as good an indicator of health as mortality itself. So, simply by changing the Y axis from 'survival' to 'event-free survival' (or something to that effect), where 'events' are either death or reaching the euthanasia criterion, leaves the analysis as it is, and authors do not need to clarify that figure 1H shows "apparent mortality", as it is straightforward "complication-free survival" (just not entirely orthogonal to weight loss).

      The Y axis was changed from 'survival' to “percentage of mice not reaching the euthanasia criterion”.

      4 - Some density measurements are made unnecessarily on arbitrary units (per field of view) - this should be simple to report in absolute measures (i.e. area of tissue screened or, better still, length of epithelium screened).

      Because the aera of tissue can vary significantly between damages, regenerating and undamaged tissue, we reported the length of epithelium screened as suggested : “per 800um tissue screened” in Fig S1c and Fig 2b.

      5 - Figure 2E should read "percent involvement"

      This has been corrected.

      6 - Figure 2J should read "lipocalin..."

      This has been corrected.

      7 - In section "CMG2 Is Dispensable for YAP/TAZ-Mediated Reprogramming to Fetal-Like Stem Cells", the authors write ""We measured the mRNA levels of two additional YAP target genes, Cyr61 and CTGF...". I presume the "additional" is because Ly6a is also a target of YAP/TAZ, but if the reader does not know, it is puzzling. I would suggest to make this link explicit.

      We added : “In addition to the fetal-like stem cell marker Ly6a, which is a YAP/TAZ target gene, we measured the mRNA levels of two others YAP target genes, Cyr61 and CTGF”

      8 - In Figures S2, 3 and S3, I think that the measures expressed as "% of homeostatic X in WT" really mean "% of average homeostatic X in WT". This should be made clear somewhere.

      We added: “Dotted line represents the average homeostatic levels of Cmg2 WT” in figure legends

      9 - In panel C, the nature of the data is not entirely clear. First, the corresponding part of the legend says "Representative images of n=4 mice per genotype" which I presume should refer to panel B. Then, the graph plots 4 data points, which suggests that they correspond to 4 mice - but how many fields of view? Also, the violin plot outline is not described - I presume it captures all the data points from the coarse-grained pixel analysis, but it should be clarified.

      It was modified as suggested : “(c) Results are presented as violin plot of the Ly6a mean intensity of all data points from the coarse-grain analysis. Each symbol represents the mean per mice of n=4 mice per condition. Results are mean ± SEM. Dotted line represents the average homeostatic levels of Cmg2WT. P values obtained by two-tailed unpaired t test.”

      10 - In Figure 3H and 3I, I would suggest to add the 7+3 timepoint where the data come from.

      We unfortunately do not understand the suggestion of the reviewer, given that these panels show the 7+3 time point.

      11 - In section "CMG2 Is Critical for Restoring the Lgr5+ Intestinal Stem Cell Pool", the authors say "...The mRNA levels of ... LRP6, β-catenin (Fig. S3a-b), and Wnt ligands (Wnt5a, 5b, and 2b) were comparable between the colons of Cmg2WT and Cmg2KO mice (Fig. S3c)..." without clarifying in which context - one needs to read the figure legend to realise this is "timepoint 7+3". I suggest to add "in the recovery phase" or "in regenerating colons" or something shorter, just to guide the reader.

      We added : “Initially, we quantified the expression of key molecular components involved in Wnt signaling in mice colon 3 days after DSS withdrawal using qPCR.”

      12 - Like with the previous point, it is not clear when the immunohistofluorescence of B-catenin is made - not even in the legend, as far as I could see. The only hint is that authors say "the nuclei of cells in the atrophic crypts of Cmg2KO..." with 'atrophic' probably indicating again the 7+3 timepoint.

      We have changed the text and now mention “Next, we analyzed β-catenin activation in the colon of Cmg2WT and Cmg2KO mice during the recovery phase.”

      13 - A typo in the discussion: tunning for tuning.

      This has been corrected.

      14 - In the discussion, the authors talk about the 'CMG2' protein (all caps - formatting convention for human proteins) but before they were referring to 'Cmg2' (formatting convention for mouse proteins). That is fine but some of the statements where "CMG2" is used clearly refer to observations made in the mouse.

      We have now used Cmg2, whenever referring to the mouse protein.

      15 - Typos in methods: "antigen retrieval by treating [with] Proteinase K"; "Image acquisition and analyze [analysis]"; "All details regarding code used for immunofluorescence analysis”.

      This has been corrected.

      __Reviewer #2 __

      We are very pleased to read that the reviewer found the study “overall well designed, meticulously carried out, and with clear and convincing results that are most reasonably and thoughtfully interpreted”.

      For this reader, one additional thought comes to mind. If I understand the field correctly it would be informative to know with greater confidence where - in what cell type, epithelial or mesenchymal - the CMG2-LRP6-WNT interaction occurs.

      This point was also raised by Reviewer I, and we have now added a new Figure 1, that describes Cmg2 expression in the gut, based both on from publicly available RNA-seq datasets and our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      After injury the CMG2-KO mouse epithelium exhibits defective WNT signal transduction - as evidenced by failure of b-catenin to translocate into the nucleus. At first glance, this result is a disconnect with the paper by van Rijin that claims the defect in Hyaline Fibromatosis Syndrome cannot be due to loss of CMG2 expression/function in the barrier epithelial cell - a claim based on the mostly normal phenotypes of human CMG2 KO duodenal organoids. But the human organoids studied in the van Rijin paper, like all others, are established and cultured in very high WNT conditions, perhaps obscuring the lack of the CMG2-LRP6-WNT interaction. And in fact, the phenotypes of these human CMG2-KO duodenoids were not entirely normal - the CMG2-KO stem-like organoids (even when cultured in high WNT/R-spondin conditions) developed abnormal intercellular blisters consistent with a defect in epithelial structure/function - of unknown cause and not investigated.

      We thank the reviewer for raising this point and we fully agree. We now specify in the text that the human CMG2-KO duodenoids showed blisters, indeed consistent with a defect in epithelial structure/function, and that they were grown on high Wnt media which likely obscure the CMG2 requirement.

      I think it would be informative to prepare colon organoids (and duodenoids) from WT and CMG2-KO mice to quantify their WNT dependency during establishment and maintenance of the stem-like (and WNT-dependent) state. If CMG2 acts within the epithelial cell to affect WNT signaling (regardless of WNT source), organoids prepared from colons of CMG2-KO mice would require more WNT in culture media to establish and maintain the stem cell proliferative state - when compared to organoids prepared from WT mice. This can be quantified (and confirmed molecularly by transgene expression if successful). Enhanced dependency of high concentrations of exogenous WT would be evidence for a primary defect in WNT-(LRP2)-CMG2 signal transduction localized to the epithelial barrier cell - thus addressing the apparent discrepancy with the van Rijin paper - and for my part, advancing the field. And the discovery of a defect in the epithelium itself for WNT signal transduction would implicate a biologically most plausible mechanism for development of protein losing enteropathy.

      By no means do I consider these experiments to be required for publication (especially if considered to be incremental or already defined - WNT-CMG2 is not my field of research). This study already makes a meaningful contribution to the field as I state above. But in the absence of new experimentation, the issue should probably be discussed in greater depth.

      We are working out conditions to grow colon organoids that from WT and Cmg2 KO mice, indeed playing around with the concentrations of Wnt in the various media to identify those that would best mimic the regeneration conditions. This is indeed a study in itself. We have however included a discussion on this point in the manuscript as suggested.

      __Reviewer #3: __

      We thank the reviewer for her/his insightful comments.

      The premise is that the causative germline mutated gene, CMG2/ANTRX2, may have a functional role in colonic epithelium in addition to controlling the ECM composition. There is little background information but one study has shown no primary defect in epithelial organoids grown from patients with the syndrome. This leads the authors to wonder if non-homeostatic, conditions might reveal a function role for the gene in regeneration.

      Reviewer 2 commented on the fact that “human organoids studied in the van Rijin paper, like all others, are established and cultured in very high WNT conditions, perhaps obscuring the lack of the CMG2-LRP6-WNT interaction. And in fact, the phenotypes of these human CMG2-KO duodenoids were not entirely normal - the CMG2-KO stem-like organoids (even when cultured in high WNT/R-spondin conditions) developed abnormal intercellular blisters consistent with a defect in epithelial structure/function - of unknown cause and not investigated”.

      We have now added a discussion on this point in the manuscript.

      The authors' approach to test the hypothesis is to use a mouse germline knockout model and to induce colitis and regeneration by the established protocol of introducing dextran sodium sulfate (DSS) into the drinking water for five days. In brief there is no phenotype apparent in the untreated knockout (KO) but these animals show a more severe response to DSS that requires them to be killed by 10 days after the start of treatment. This effect following phenotypic characterisation of the colonic epithelium is interpreted as showing the CMG2 is a Wnt modifier required for the restoration of the intestinal stem cell population in the final stages of repair.

      The experiment and analysis seem reasonably well executed - although a few specific comments follow below. The narrative is simple and easy to understand. However, there are significant caveats that cast doubts on the interpretation made that loss of CMG2 impairs the transition of colonic epithelial cells from a fetal like state to adult ISCs.

      First there is only a single approach and single type of experiment performed. There is a lack of independent validation of the phenotype and how it is mediated.

      We do not fully understand what type of independent validation of the phenotype the reviewer would have liked to see. Is it the induction of intestinal damage using a stress other than DSS?

      The DSS dose in this kind of experiment is often determined empirically in individual units. Here the 3% used is within published range but at upper end. The control animals show a typical response with symptoms of colitis worsening for 2-3 days after the removal of DSS and then recovery commonly over another 5-7 days. Here the CMG2 KO mice fail to recover and are killed by 9 or 10 days. The authors attempt to exploit the time course by identifying normal initial (7days) and defective late (10days) repair phases in KO animals when compared to controls. It is from this comparison that conclusions are drawn. However, the alternative interpretation might be that the epithelium of KO animals is so badly damaged, and indeed non-existent (from viewing Fig2a), that it is incapable of mounting any other response other than death and that the profiling shown is of an epithelium in extremis. The repair capability and dynamics of the KO would have been better tested under more moderate DSS challenge, if this experiment had been regarded as a pilot rather than as definitive.

      The choice of 3% DSS was in fact based on a pilot experiment. As now shown in Fig. S4, we tested different concentrations and found that 3% DSS was the lowest concentration that reliably induced the full spectrum of colitis-associated symptoms, including significant body weight loss, diarrhea, rectal bleeding (summarized in the Disease Activity Index), as well as macroscopic signs such as colon shortening and spleen enlargement. Based on these criteria, we selected 3% DSS for the study described in the manuscript.

      In this model, WT mice showed a typical progression: body weight stabilized rapidly after DSS withdrawal, with resolution of diarrhea and rectal bleeding. Histological analysis at day 9 revealed signs of epithelial regeneration, including hypertrophic crypts and increased epithelial proliferation.

      In contrast, Cmg2KO mice failed to initiate this recovery phase. Clinical signs such as weight loss, diarrhea, and bleeding persisted after DSS withdrawal, ultimately necessitating euthanasia at day 9–10 due to humane endpoint criteria. Unfortunately, this prevented us from exploring later timepoints to determine whether regeneration was delayed or completely abrogated in the absence of Cmg2.

      Regarding the severity of epithelial damage, as raised by Reviewer 1, we now provide detailed histological scoring in the supplementary data. This analysis shows that the severity of inflammation and crypt damage was similar between WT and KO animals, as were inflammatory markers such as Lipocalin-2. The key difference lies in the extent of tissue involvement. While the lesions in WT mice were more localized, Cmg2KO mice displayed widespread and diffuse damage with no sign of regeneration as shown by the absence of hypertrophic crypts and a marked reduction in both epithelial coverage and proliferative cells. Importantly, at day 7, the percentage of epithelial and proliferating cells was comparable between genotypes, further supporting the idea that Cmg2KO mice failed to initiate this recovery phase and present a defective repair response.

      The animals used were young (8 weeks) and lacked any obvious defect in collagen deposition. Does this change with treatment? Even if not, is it possible that there is a defect in peristalsis or transit time of gut contents, resulting in longer dwell times and higher effective dose of DSS to the KO epithelium?

      Collagen deposition, particularly of collagen VI, is known to increase in response to intestinal injury and plays a critical role in promoting tissue repair following DSS-induced damage (Molon et al., PMID: 37272555). As suggested, we investigated whether Cmg2KO mice exhibit abnormal collagen VI accumulation following DSS treatment.

      Our results show that, consistent with published data, WT mice exhibit a marked increase in collagen VI expression during the acute phase of colitis, with levels returning toward baseline following DSS withdrawal. A similar expression pattern was observed in Cmg2KO mice, with no significant differences in Col6a1 mRNA levels between WT and KO animals throughout the entire time course of the experiment. This observation was further confirmed at the protein level by western blot and immunohistochemistry analyses, suggesting that the impaired regenerative capacity observed in Cmg2KO mice is independent of Collagen VI.

      Regarding the possibility of altered peristalsis or intestinal transit time contributing to increased DSS exposure in KO mice, this is indeed a possibility. Although we did not directly measure gut motility in this study, we did not observe any signs of intestinal obstruction or fecal retention in Cmg2KO mice. Indeed, during the experiment, animals were single caged for 30min in order to collect feces and no difference in the amount of feces collected was observed between WT and KO mice, arguing against a substantial difference in transit time (see figure below). The possible altered peristalsis and these observations are now mentioned in the discussion.

      Is CMG2 RNA and protein expressed in the colonic epithelium? It is not indicated or tested in the submitted manuscript. This reviewer struggled to find evidence, notably it did not seem to be referenced in the organoid paper they reference in introduction (ref 13).

      This very valid point was also raised by Reviewers 1 and 2. The expression pattern of Cmg2 in the gut has indeed not been characterized and is essential to understanding its function. To address this gap, we added a figure (Fig. 1) providing data from publicly available RNA-seq datasets and from our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      __Specific comments: __

      Figure 3 c-e and associated text are confusing. In c the Y scale seems inappropriate to show percentages up to 15,000%.

      In this graph values are normalized to homeostatic level of WT mice which represent 100%

      In d and e the use of percentages may by correct. However, it is claimed in text that Cty61 and CTFG are upregulated in the KO. That is not what the plots appear to show as the compare to WT untreated cells, in which case the KO have not downregulated these genes in the way the controls have.

      As clarified in the text, under regenerative conditions, a transient activation of YAP signaling is crucial to induce a fetal-like reversion of intestinal stem cells. However, in a subsequent phase, the downregulation of YAP and the reactivation of Wnt signaling are necessary to complete intestinal regeneration. Several studies have highlighted a strong interplay between the Wnt and YAP pathways, suggesting that their coordinated regulation is essential for effective gut repair. Nevertheless, the precise mechanisms governing this interaction remain incompletely understood.

      In our model, this critical transition—YAP downregulation and Wnt reactivation—appears to be impaired. CMG2 may either hinder Wnt reactivation directly, or lead to sustained YAP signaling, which in turn suppresses activation of the Wnt pathway. Further studies, using in-vivo model and organoid models, will be necessary to understand the mechanistic role of Cmg2 in this regulatory process.

      A precision of the figure has been updated as followed: both of which were significantly upregulated in the injured colons of Cmg2KO mice compared to DSS-injured Cmg2WT mice

      __**Referees cross-commenting** __

      Rev2 Points 1 and 2 made by Referee 1 (and point 4 of Referee 3) appear most reasonable, and if not already done should be.

      We have indeed addressed these 2 points.

      I also noted the more severe morphology of DSS damaged epithelium shown in Fig 2a noted by Referee 3 - and this I agree is a confounding factor. […] For my part, the concern is understandable but likely not operating in a confounding way. And the evidence for the reprogramming of the damaged epithelium into "fetal-like stem cells" (the 1st step in restitution of lost stem cells) occurs in both WT and KO mice - and these data are strong. For this reader, the block convincingly shows up for KO mouse at the WNT dependent step

      The representative image has been updated, and a transverse section has been added to better illustrate that, although both epithelium and crypt structures can be present, the epithelial morphology differs significantly. Indeed, the regenerating epithelium of Cmg2WT mice displays a thick epithelial layer with well-polarized epithelial cells, whereas in cmg2KO mice, the epithelium appears atrophic, characterized by a thinner epithelial layer and elongated epithelial cells.

      __Rev 3 __

      This reviewer remains sceptical. I agree the authors performed the experiment well to confirm that DSS dosing was as equivalent as possible across the study. But DSS acts to induce colitis because it is concentrated in the colonic lumen as water is absorbed. Also ECM responses and remodelling are a central part of colitis models. And my concern is that the actual exposure in the KO group is influenced by transit of faeces/DSS is secondary to the known action of CMG2 on collagen deposition. The consequence of this being a protracted damage phase in which a restoration of adult stem cells would not be expected and leading to epithelial failure.

      However, we differ. I might propose that the authors are asked to investigate and confirm expression of CMG2 in the epithelium and to repeat the analysis of collagen levels they performed on untreated CMG2 KO mice on colons from CMG2 KO mice having received DSS to see if these differ from controls.

      This has now been done.

      __Rev 1 __

      Both reviewer #2 and reviewer #3 make relevant points, from the point of view of extracting as much biological knowledge as we can from the observations reported in the manuscript.

      Reviewer #2 suggestion to use Cmg2[KO] organoids to investigate the dependence of Wnt transduction on Cmg2 is the type of experiments I refrained to propose. However, I think the "skeleton" of the mechanism is there and is reasonably solid. Fleshing it out may well be another paper.

      I agree with Reviewer #3 objections to the timing and severity of the DSS damage. However, I am not sure how much they invalidate the main tenet of the paper:

      • DSS may affect Cmg2[KO] more severely, but the overall disease score is comparable during the DSS treatment. If this severity was enough to be the main driver of the phenotype, it should have left a mark in the Histological and Disease activity scores. In this regard, I think it would be helpful if the authors provided an expanded version of Figure 2A with examples of the different levels of "Crypt damage" scored, and the proportions for each. This could be in the supplementary material and would balance the impressions induced by a single image.

      As suggested, we included a detail of histological score including the crypt damage score in Supplementary Fig 3i showing no significant differences in crypt damage between Cmg2WT and Cmg2KO mice.

      • If DSS affected the recovery, this would also be compatible with having a more severe histological phenotype (which is not shown overall, just in Fig 2A) because one would also expect the tissue to attempt regeneration during the 7 days of DSS treatment.

      This is an interesting point, and we now allude to this aspect in the manuscript.

      • The only objection that I find difficult to argue is the effective duration of the treatment. If indeed peristalsis is affected, it may be that during the 'recovery' phase there is still DSS in the intestine. This could be perhaps verified using a DS detection assay (e.g. https://arxiv.org/pdf/1703.08663) on the intestinal contents or the faeces of the mice during the 3-day recovery period.

      We have attempted to obtain and purchase Heparin Red to perform this assay. Unfortunately, we have not obtained the reagent, which has never been delivered. We now also mention the following in the Discussion:

      One could envision that Cmg2KO mice have a defect in peristalsis resulting in longer dwell times and possibly higher effective dose of DSS to the KO epithelium. We however did not observe any signs of intestinal obstruction or fecal retention in Cmg2KO mice. Animals were single-caged for 30 min to collect feces. We did not observe any difference in amounts collected from WT and KO mice, arguing against a substantial difference in transit time of gut contents. Moreover, if DSS affected the recovery, one would have expected a more severe histological phenotype in the colon of Cmg2KO since the tissue likely already attempts regeneration during the 7 days of DSS treatment. But this was not the case. Therefore, while we cannot formally rule out the presence of residual DSS in Cmg2KO mice during the DSS withdrawal phase, there is currently no indication that this was the case.

      I think of what the aim of scholarly publication is, with this paper, and I find myself going back to a statement of the authors' discussion - that this work suggests that infants risking death may be offered (compassionate, I guess) IBD treatment. What does this hinge upon? I think, on the basic observation that diarrhoea (in the mouse model) is not intrinsic but caused by an inflammation-promoting insult. Is this substantiated? I think it is. Could we learn more biology from this disease model, about Wnt and about how ECM affects tissue regeneration? Certainly. Can this learning wait? I believe it can.

      We thank the reviewer for this statement.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this work, Bracq and colleagues provide clear evidence that the persistent diarrhoea seen in a mouse model of Hyaline Fibromatosis Syndrome is related to the inability of their intestinal epithelium to properly regenerate. This is very clear and of immediate impact. This aspect of the paper, I think, is ready for publication, and would merit immediate dissemination on its own. It is great that the manuscript is in bioRxiv already.

      I am not so thoroughly convinced about the mechanism that the author propose to explain the incapacitation of Cmg2[KO] intestinal stem cells to function properly. The authors propose that it is due to their inability to transduce Wnt signals, and while this is plausible, I think there are few things that the paper should contain before this can be proposed firmly:

      Point #1

      The mouse mutant is just described as 'KO', referring to the previous work by the authors. The cited work simply states that this is a zygotic deletion of exon 3, which somehow leads to a decrease in protein abundance that is almost total in the lung but not so clear in the uterus. Exon 3 happens to be 72 bp long [https://www.ncbi.nlm.nih.gov/nuccore/NM_133738], so its deletion (assuming there are no cryptic splicing sites used) leads to an internal in-frame deletion of 24 amino acids. So, at best, this 'KO' is not a null, but a hypomorphic allele of context-dependent strength. Unfortunately, neither the previous work nor this paper (unless I have missed it!) contains information provided about the expression levels of Cmg2 in the intestine of KO mice - nor which cell types usually express it (see below). I think that using anti Cmg2 in WB and immunohistofluorescence of with ISC markers with intestine homogenate/sections of wild-type and mutant mice would be necessary to set the stage for the rest of the work.

      Point #2

      Connected to the previous point, the expression pattern of Cmg2 in the intestine is not described. Maybe this is already established in the literature, but the authors do not refer to the data. This is important when considering that the previous work of the authors suggests that Cmg2 might contribute to Wnt signalling transduction through physical, cis interactions with the Wnt co-receptor LRP6. Therefore, one would expect that Cmg2 would be cell-autonomously required in the intestinal stem cells.

      Point #3

      The authors establish that the regenerating crypts of Cmg2[KO] mice are unable to transduce Wnt signalling, but it is not clear whether this situation is provoked by the DSS-induce injury or existed all along. Can Cmg2[KO] intestinal stem cells transduce Wnt signalling before the DSS challenge? If they were, it might suggest that the 'context-dependence' of the Cmg2 role in Wnt signalling is contextual not only because of the tissue, but because of the history of the tissue or its present structure. It would also suggest that Cmg2 mutant mice, unless reared in a germ-free facility for life, would eventually lose intestinal homeostasis, and maybe suggest the level of intervention/monitoring that HFS patients would require. It might also provide an explanation in case Cmg2 was not expressed in ISCs - if the state of the tissue was as important as the presence of the protein, then the effect on Wnt transduction could be indirect and therefore it might not be required cell-autonomously.

      I think points 1 and 2 are absolutely fundamental in a reverse genetics investigation. Point 3 would be nice to know but the outcome would not change the tenet of the paper. I believe that the work needed to deal these points can be performed on archival material. I do not think the mechanism proposed can be taken from 'plausible' to 'proven' without proposing substantial additional investigation, so I will not suggest any of it, as it could well be another paper.

      A few minor points picked along the way:

      1. Figure 1 legend says "In (c), results are mean {plus minus} SEM" - this seems applicable to (d) as (c) does not show error whiskers.
      2. Figure 1 legend says "(d) Body weight loss, (f) the aspect of the feces and presence of occult blood were monitored and used for the (e) DAI. Results are mean {plus minus} SEM. Each dot represents the mean of n = 12 mice per genotype". This part looks like has suffered some rearrangement of words. The first instance of (f) should be (e), I guess, and I am not sure what "(e) DAI" means. And for (e), "mean {plus minus} SEM" does not seem applicable. This needs some light revision.
      3. Figure 1H legend does not say which statistical test was made in the survival experiment in (h) - presumably log-rank? A further comment on the survival statistics: euthanised animals should not be counted towards true mortality when that is what is recorded as an 'event'. They should be right-censored. However, in this case, reaching the euthanasia criterion is just as good an indicator of health as mortality itself. So, simply by changing the Y axis from 'survival' to 'event-free survival' (or something to that effect), where 'events' are either death or reaching the euthanasia criterion, leaves the analysis as it is, and authors do not need to clarify that figure 1H shows "apparent mortality", as it is straightforward "complication-free survival" (just not entirely orthogonal to weight loss).
      4. Some density measurements are made unnecessarily on arbitrary units (per field of view) - this should be simple to report in absolute measures (i.e. area of tissue screened or, better still, length of epithelium screened).
      5. Figure 2E should read "percent involvement"
      6. Figure 2J should read "lipocalin..."
      7. In section "CMG2 Is Dispensable for YAP/TAZ-Mediated Reprogramming to Fetal-Like Stem Cells", the authors write ""We measured the mRNA levels of two additional YAP target genes, Cyr61 and CTGF...". I presume the "additional" is because Ly6a is also a target of YAP/TAZ, but if the reader does not know, it is puzzling. I would suggest to make this link explicit.
      8. In Figures S2, 3 and S3, I think that the measures expressed as "% of homeostatic X in WT" really mean "% of average homeostatic X in WT". This should be made clear somewhere.
      9. In panel C, the nature of the data is not entirely clear. First, the corresponding part of the legend says "Representative images of n=4 mice per genotype" which I presume should refer to panel B. Then, the graph plots 4 data points, which suggests that they correspond to 4 mice - but how many fields of view? Also, the violin plot outline is not described - I presume it captures all the data points from the coarse-grained pixel analysis, but it should be clarified.
      10. In Figure 3H and 3I, I would suggest to add the 7+3 timepoint where the data come from.
      11. In section "CMG2 Is Critical for Restoring the Lgr5+ Intestinal Stem Cell Pool", the authors say "...The mRNA levels of ... LRP6, β-catenin (Fig. S3a-b), and Wnt ligands (Wnt5a, 5b, and 2b) were comparable between the colons of Cmg2WT and Cmg2KO mice (Fig. S3c)..." without clarifying in which context - one needs to read the figure legend to realise this is "timepoint 7+3". I suggest to add "in the recovery phase" or "in regenerating colons" or something shorter, just to guide the reader.
      12. Like with the previous point, it is not clear when the immunohistofluorescence of B-catenin is made - not even in the legend, as far as I could see. The only hint is that authors say "the nuclei of cells in the atrophic crypts of Cmg2KO..." with 'atrophic' probably indicating again the 7+3 timepoint.
      13. A typo in the discussion: tunning for tuning.
      14. In the discussion, the authors talk about the 'CMG2' protein (all caps - formatting convention for human proteins) but before they were referring to 'Cmg2' (formatting convention for mouse proteins). That is fine but some of the statements where "CMG2" is used clearly refer to observations made in the mouse.
      15. Typos in methods: "antigen retrieval by treating [with] Proteinase K"; "Image acquisition and analyze [analysis]"; "All details regarding code[s] used for immunofluorescence analysis"

      Referees cross-commenting

      *this session contains comments from ALL the reviewers"

      Rev2

      Points 1 and 2 made by Referee 1 (and point 4 of Referee 3) appear most reasonable, and if not already done should be.

      I also noted the more severe morphology of DSS damaged epithelium shown in Fig 2a noted by Referee 3 - and this I agree is a confounding factor. But overall, multiple lines of evidence were assembled to show that the KO mice and WT mice suffered DSS-induced colitis with equal severity - and with closely equal severity of damage to the intestinal epithelium (though the image in Fig 2a is disturbing). For my part, the concern is understandable but likely not operating in a confounding way. And the evidence for the reprogramming of the damaged epithelium into "fetal-like stem cells" (the 1st step in restitution of lost stem cells) occurs in both WT and KO mice - and these data are strong. For this reader, the block convincingly shows up for KO mouse at the WNT dependent step

      Rev 3 This reviewer remains sceptical. I agree the authors performed the experiment well to confirm that DSS dosing was as equivalent as possible across the study. But DSS acts to induce colitis because it is concentrated in the colonic lumen as water is absorbed. Also ECM responses and remodelling are a central part of colitis models. And my concern is that the actual exposure in the KO group is influenced by transit of faeces/DSS is secondary to the known action of CMG2 on collagen deposition. The consequence of this being a protracted damage phase in which a restoration of adult stem cells would not be expected and leading to epithelial failure.

      However, we differ. I might propose that the authors are asked to investigate and confirm expression of CMG2 in the epithelium and to repeat the analysis of collagen levels they performed on untreated CMG2 KO mice on colons from CMG2 KO mice having received DSS to see if these differ from controls.

      Rev 1 Both reviewer #2 and reviewer #3 make relevant points, from the point of view of extracting as much biological knowledge as we can from the observations reported in the manuscript.

      Reviewer #2 suggestion to use Cmg2[KO] organoids to investigate the dependence of Wnt transduction on Cmg2 is the type of experiments I refrained to propose. However, I think the "skeleton" of the mechanism is there and is reasonably solid. Fleshing it out may well be another paper.

      I agree with Reviewer #3 objections to the timing and severity of the DSS damage. However, I am not sure how much they invalidate the main tenet of the paper:

      • DSS may affect Cmg2[KO] more severely, but the overall disease score is comparable during the DSS treatment. If this severity was enough to be the main driver of the phenotype, it should have left a mark in the Histological and Disease activity scores. In this regard, I think it would be helpful if the authors provided an expanded version of Figure 2A with examples of the different levels of "Crypt damage" scored, and the proportions for each. This could be in the supplementary material and would balance the impressions induced by a single image.

      • If DSS affected the recovery, this would also be compatible with having a more severe histological phenotype (which is not shown overall, just in Fig 2A) because one would also expect the tissue to attempt regeneration during the 7 days of DSS treatment.

      • The only objection that I find difficult to argue is the effective duration of the treatment. If indeed peristalsis is affected, it may be that during the 'recovery' phase there is still DSS in the intestine. This could be perhaps verified using a DS detection assay (e.g. https://arxiv.org/pdf/1703.08663) on the intestinal contents or the faeces of the mice during the 3-day recovery period.

      I think of what the aim of scholarly publication is, with this paper, and I find myself going back to a statement of the authors' discussion - that this work suggests that infants risking death may be offered (compassionate, I guess) IBD treatment. What does this hinge upon? I think, on the basic observation that diarrhoea (in the mouse model) is not intrinsic but caused by an inflammation-promoting insult. Is this substantiated? I think it is. Could we learn more biology from this disease model, about Wnt and about how ECM affects tissue regeneration? Certainly. Can this learning wait? I believe it can.

      Significance

      In this work, Bracq and colleagues provide clear evidence that the persistent diarrhoea seen in a mouse model of Hyaline Fibromatosis Syndrome is related to the inability of their intestinal epithelium to properly regenerate. This is very clear and of immediate impact. For instance, the authors themselves point at the possibility of applying treatments for Inflammatory Bowel Disease to HFS patients. While what happens in a mouse model is not necessarily the same as in human patients, the fact that persistent diarrhoea is a life-threatening symptom in HFS make this proposal, at least in compassionate use of the therapies and until its efficacy is disproven, very plausible. This is a clear gap of knowledge that addresses an unmet medical need.

      I find that the work shows clearly that HFS mouse model subjects have normal intestinal function until challenged with a standard chemically-induced colitis. Then, the histological and health deterioration of the HFS mouse model is clear in comparison with normal mice, which can regenerate appropriately. This is shown with a multiplicity of orthogonal techniques spanning molecular, histological and organismal, which are standard and very well reported in the paper.

      The authors propose a specific cellular and molecular mechanism to explain the incapacity of the intestinal epithelium in the mouse model of HFS to regenerate. According to this mechanism, the protein Cmg2, whose mutation causes HFS in humans, would be necessary for intestinal stem cells to transduce the signal of Wnt ligands and therefore support their behaviour as regenerative cells. This mechanism is plausible, but more basic and advanced work would be needed to take it as proven.

      This work would be of interest to both the clinical, biomedical, and basic research communities interested in rare diseases, the gastrointestinal system, collagen and extracellular matrix, and Wnt signalling.

      My general expertise is in developmental and stem cell biology using reverse genetics, transgenesis and immunohistological and molecular methods of data production, and lineage tracing, digital imaging and bioinformatic analytical methods; I work with Drosophila melanogaster and its adult gastrointestinal system.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) The initial high accumulation by all cells followed by the emergence of a sub-population that has reduced its intracellular levels of tachyplesin is a key observation and I agree with the authors' conclusion that this suggests an induced response to the AMP is important in facilitating the bimodal distribution. However, I think the conclusion that upregulated efflux is driving the reduction in signal in the "low accumulator" subpopulation is not fully supported. Steady-state amounts of intracellular fluorescent AMP are determined by the relative rates of influx and efflux and a decrease could be caused by decreasing influx (while efflux remained unchanged), increasing efflux (while influx remained unchanged), or both decreasing influx and increasing efflux. Given the transcriptomic data suggest possible changes in the expression of enzymes that could affect outer membrane permeability and outer membrane vesicle formation as well as efflux, it seems very possible that changes to both influx and efflux are important. The "efflux inhibitors" shown to block the formation of the low accumulator subpopulation have highly pleiotropic or incompletely characterised mechanisms of action so they also do not exclusively support a hypothesis of increased efflux.

      We agree with the reviewer that the emergence of low accumulators after 30 min in the presence of extracellular tachyplesin-NBD (Figure 4A) could be due to either decreased influx while efflux remained unchanged, increased efflux while influx remained unchanged, or both decreasing influx and increasing efflux. Increased proteolytic activity or increased secretion of OMVs could also play a role.

      We have now acknowledged that “Reduced intracellular accumulation of tachyplesin-NBD in the presence of extracellular tachyplesin-NBD could be due to decreased drug influx, increased drug efflux, increased proteolytic activity or increased secretion of OMVs.” (lines 313-315).

      However, the emergence of low accumulators after 60 min in the absence of extracellular tachyplesin-NBD in our efflux assays (Figure 4C) cannot be due to decreased influx while efflux remained unchanged because of the absence of extracellular tachyplesin-NBD. We acknowledge that in our original manuscript we did not explicitly state that the efflux assays reported in Figure 4C-D were performed in the absence of tachyplesin-NBD in the extracellular environment. We have now clarified this point in our manuscript, we have added illustrations in Figure 4A, 4C-D and we have also carried out efflux assays using ethidium bromide (EtBr) to further support our conclusions about the primary role played by efflux in reducing tachyplesin accumulation in low accumulators. We have added the following paragraphs to our revised manuscript:

      “Next, we performed efflux assays using ethidium bromide (EtBr) by adapting a previously described protocol [62]. Briefly, we preloaded stationary phase E. coli with EtBr by incubating cells at a concentration of 254 µM EtBr in M9 medium for 90 min. Cells were then pelleted and resuspended in M9 to remove extracellular EtBr. Single-cell EtBr fluorescence was measured at regular time points in the absence of extracellular EtBr using flow cytometry. This analysis revealed a progressive homogeneous decrease of EtBr fluorescence due to efflux from all cells within the stationary phase E. coli population (Figure S13A). In contrast, when we performed efflux assays by preloading cells with tachyplesin-NBD (46 μg mL<sup>-1</sup> or 18.2 μM), followed by pelleting and resuspension in M9 to remove extracellular tachyplesin-NBD, we observed a heterogeneous decrease in tachyplesin-NBD fluorescence in the absence of extracellular tachyplesin-NBD: a subpopulation retained high tachyplesin-NBD fluorescence, i.e. high accumulators; whereas another subpopulation displayed decreased tachyplesin-NBD fluorescence, 60 min after the removal of extracellular tachyplesin-NBD (Figure 4B). Since these assays were performed in the absence of extracellular tachyplesin-NBD, decreased tachyplesin-NBD fluorescence could not be ascribed to decreased drug influx or increased secretion of OMVs in low accumulators, but could be due to either enhanced efflux or proteolytic activity in low accumulators.

      Next, we repeated efflux assays using EtBr in the presence of 46 μg mL<sup>-1</sup> (or 20.3 µM) extracellular tachyplesin-1. We observed a heterogeneous decrease of EtBr fluorescence with a subpopulation retaining high EtBr fluorescence (i.e. high tachyplesin accumulators) and another population displaying reduced EtBr fluorescence (i.e. low tachyplesin accumulators, Figure S14B) when extracellular tachyplesin-1 was present. Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].

      Taken together, our data demonstrate that in the absence of extracellular tachyplesin, stationary phase E. coli homogeneously efflux EtBr, whereas only low accumulators are capable of performing efflux of intracellular tachyplesin after initial tachyplesin accumulation. In the presence of extracellular tachyplesin, only low accumulators can perform efflux of both intracellular tachyplesin and intracellular EtBr. However, it is also conceivable that besides enhanced efflux, low accumulators employ proteolytic activity, OMV secretion, and variations to their bacterial membrane to hinder further uptake and intracellular accumulation of tachyplesin in the presence of extracellular tachyplesin.”

      These amendments can be found on lines 316-350 and in the new Figure S13 and Figure 4. We have also carried out more tachyplesin-NBD accumulation assays using single and double gene-deletion mutants lacking efflux components, please see Response 3 to reviewer 2 and the data reported in Figure 4B.

      (2) A conclusion of the transcriptomic analysis is that the lower accumulating subpopulation was exhibiting "a less translationally and metabolically active state" based on less upregulation of a cluster of genes including those involved in transcription and translation. This conclusion seems to borrow from well-described relationships referred to as bacterial growth laws in which the expression of genes involved in ribosome production and translation is directly related to the bacterial growth (and metabolic) rate. However, the assumptions that allow the formulation of the bacterial growth laws (balanced, steady state, exponential growth) do not hold in growth arrest. A non-growing cell could express no genes at all or could express ribosomal genes at a very low level, or efflux pumps at a high level. The distribution of transcripts among the functional classes of genes does not reveal anything about metabolic rates within the context of growth arrest - it only allows insight into metabolic rates when the constraint of exponential growth can be assumed. Efflux pumps can be highly metabolically costly; for example, Tn-Seq experiments have repeatedly shown that mutants for efflux pump gene transcriptional repressors have strong fitness disadvantages in energy-limited conditions. There are no data presented here to disprove a hypothesis that the low accumulators have high metabolic rates but allocate all of their metabolic resources to fortifying their outer membranes and upregulating efflux. This could be an important distinction for understanding the vulnerabilities of this subpopulation. Metabolic rates can be more directly estimated for single cells using respiratory dyes or pulsed metabolic labelling, for example, and these data could allow deeper insight into the metabolic rates of the two subpopulations. My main recommendation for additional experiments to strengthen the conclusions of the paper would be to attempt to directly measure metabolic or translational activity in the high- and low-accumulating populations. I do not think that the transcriptomic data are sufficient to draw conclusions about this but it would be interesting to directly measure activity. Otherwise, it might be reasonable to simply soften the language describing the two populations as having different activity levels. They do seem to have different transcriptional profiles, and this is already an interesting observation.

      We agree with the reviewer that it might be misleading to draw conclusions on bacterial metabolic states solely based on transcriptomic data. We have therefore removed the statement “low accumulators displayed a less translationally and metabolically active state”. We have instead stated the following: “Our transcriptomics analysis showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression processes compared to high accumulators”. Moreover, we have employed the membrane-permeable redox-sensitive dye C<sub>12</sub>-resazurin, which is reduced to the fluorescent C<sub>12</sub>-resorufin in metabolically active cells, to obtain a more direct estimate of the metabolic state of low and high accumulators of tachyplesin. We have added the following paragraph reporting our new data:

      “Our transcriptomics analysis also showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression compared to high accumulators. To gain further insight on the metabolic state of low tachyplesin accumulators, we employed the membrane-permeable redox-sensitive dye, resazurin, which is reduced to the highly fluorescent resorufin in metabolically active cells. We first treated stationary phase E. coli with 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD for 60 min, then washed the cells, and then incubated them in 1 μM resazurin for 15 min and measured single-cell fluorescence of resorufin and tachyplesin-NBD simultaneously via flow cytometry. We found that low tachyplesin-NBD accumulators also displayed low fluorescence of resorufin, whereas high tachyplesin-NBD accumulators also displayed high fluorescence of resorufin (Figure S16), suggesting lower metabolic activity in low tachyplesin-NBD accumulators.”

      These amendments can be found on lines 398-408 and in Figure S16.

      (3) The observation that adding nutrients to the stationary phase cultures pushes most of the cells to the "high accumulator" state is presented as support of the hypothesis that the high accumulator state is a higher metabolism/higher translational activity state. However, it is important to note that adding nutrients will cause most or all of the cells in the population to start to grow, thus re-entering the familiar regime in which bacterial growth laws apply. This is evident in the slightly larger cell sizes seen in the nutrient-amended condition. In contrast to stationary phase cells, growing cells largely do not exhibit the bimodal distribution, and they are much more sensitive to tachyplesin, as demonstrated clearly in the supplement. Growing cells are not necessarily the same as the high-accumulating subpopulation of non-growing cells.

      Following the reviewer’s suggestion, we are no longer using the nutrient supplementation data to support the hypothesis that high accumulators possess higher metabolism or translational activity.

      The nutrient supplementation data is now only used to investigate whether tachyplesin-NBD accumulation and efficacy can be increased, and not to show that high tachyplesin-NBD accumulators are more metabolically or translationally active.

      Furthermore, our previous statement “Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enhanced survival to antibiotic treatment.” has now been removed from the discussion.

      (4) It might also be worth adding some additional context around the potential to employ efflux inhibitors as therapeutics. It is very clear that obtaining sufficient antimicrobial drug accumulation within Gram-negative bacteria is a substantial barrier to effective treatments, and large concerted efforts to find and develop therapeutic efflux pump inhibitors have been undertaken repeatedly over the last 25 years. Sufficiently selective inhibitors of bacterial efflux pumps with appropriate drug-like properties have been challenging to find and none have entered clinical trials. Multiple psychoactive drugs have been shown to impact efflux in bacteria but usually using concentrations in the 10-100 uM range (as here). Meanwhile, the Ki values for their human targets are usually in the sub- to low-nanomolar range. The authors rightly note that the concentration of sertraline they have used is higher than that achieved in patients, but this is by many orders of magnitude, and it might be worth expanding a bit on the substantial challenge of finding efflux inhibitors that would be specific and non-toxic enough to be used therapeutically. Many advances in structural biology, molecular dynamics, and medicinal chemistry may make the quest for therapeutic efflux inhibitors more fruitful than it has been in the past but it is likely to remain a substantial challenge.

      We agree with this comment and we have now added the following statement:

      “This limitation underscores the broader challenge of identifying EPIs that are both effective and minimally toxic within clinically achievable concentrations, while also meeting key therapeutic criteria such as broad-spectrum efficacy against diverse efflux pumps, high specificity for bacterial targets, and non-inducers of AMR [117]. However, advances in biochemical, computational, and structural methodologies hold the potential to guide rational drug design, making the search for effective EPIs more promising [118]. Therefore, more investigation should be carried out to further optimise the use of sertraline or other EPIs in combination with tachyplesin and other AMPs.”

      This amendment can be found on lines 535-542.

      (5) My second recommendation is that the transcriptomic data should be made available in full and in a format that is easier for other researchers to explore. The raw data should also be uploaded to a sequence repository, such as the NCBI Geo database or the EMBL ENA. The most useful format for sharing transcriptomic data is a table (such as an excel spreadsheet) of transcripts per million counts for each gene for each sample. This allows other researchers to do their own analyses and compare expression levels to observations from other datasets. When only fold change data are supplied, data cannot be compared to other datasets at all, because they are relative to levels in an untreated control which are not known. The cluster analysis is one way of gaining insight into biological function revealed by transcriptional profile, but it can hide interesting additional complexities. For example, rpoS is named as one of the transcription-associated genes that are higher in the high accumulator subpopulation and evidence of generally increased activity. But RpoS is the stress sigma factor that drives much lower levels of expression generally than the housekeeping sigma factor RpoD, even though it recognises many of the same promoters (and some additional stress-specific promoters). Therefore, increased RpoS occupancy of RNAP would be expected to result in overall lower levels of transcription. However, it is also true that the transcript level for the rpoS gene is a particularly poor indicator of expression - rpoS is largely post-transcriptionally regulated. More generally, annotations are always evolving and key functional insights related to each gene might change in the future, so the results are a more durable resource if they are presented in a less analysed form as well as showing the analysis steps. It can also be important to know which genes were robustly expressed but did not change, versus genes that were not detected.

      Sequencing data associated with this study have now been uploaded and linked under NCBI BioProject accession number PRJNA1096674 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1096674).

      We have added this link to the methods under subheading “Accession Numbers” on lines 858-860. Additionally, transcripts per million counts for each gene for each sample have been added to the Figure 3 - Source Data file as requested by the reviewer.

      (6) In the introduction, the susceptibility of AMP efficacy to resistance mechanisms is discussed:

      "However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance, with polymyxin-B being a notable exception 7, 8. Moreover, mobile resistance genes against AMPs are relatively rare, and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria9, again with plasmid-transmitted polymyxin resistance being a notable exception."

      It seems worth pointing out that polymixins are the only AMPs that can reasonably be compared with small molecule antibiotics in terms of resistance acquisition since they are the only AMPs that have been widely used as drugs and therefore had similar chances to select for resistance among diverse global microbial populations.

      We have now clarified that we are referring to laboratory evolutionary analyses of resistance towards small molecule antibiotics and AMPs (Spohn et al., 2019) and that polymyxins are the only AMPs that have been used in antibiotic treatment to date.

      We have added the following statement to address this point:

      “Bacteria have developed genetic resistance to AMPs, including proteolysis by proteases, modifications in membrane charge and fluidity to reduce affinity, and extrusion by AMP transporters. However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance in experimental evolution analyses, with polymyxin-B and CAP18 being notable exceptions [8]. Moreover, mobile resistance genes against AMPs are relatively rare and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria [9]. Plasmid-transmitted polymyxin resistance constitutes a notable exception [10], possibly because polymyxins are the only AMPs that have been in clinical use to date [9].”

      This amendment can be found on lines 57-65.

      (7) In the description of Figure 4, " tachyplesin monotherapy" is mentioned. It is not really appropriate to describe the treatment of a planktonic culture of bacteria in a test tube as a therapy since there is no host that is benefitting.

      We have now replaced “tachyplesin monotherapy” with “tachyplesin treatment”.

      (8) In the discussion, it is stated that " tachyplesin accumulates intracellularly only in bacteria that do not survive tachyplesin exposure" but this is clearly not true. All bacteria accumulate tachyplesin intracellularly initially, but if the bacteria are non-growing during the exposure, some of them are able to reduce their intracellular levels. The fraction of survivors is roughly correlated with the fraction of bacteria that do not maintain high intracellular levels of tachyplesin and that do not stain with propidium iodide, but for any given cell it seems that there is no clear point at which a high intracellular level of tachyplesin means that it will definitely not survive.

      We have now clarified this statement as follows: “We show that after an initial homogeneous tachyplesin accumulation within a stationary phase E. coli population, tachyplesin is retained intracellularly by bacteria that do not survive tachyplesin exposure, whereas tachyplesin is retained only in the membrane of bacteria that survive tachyplesin exposure.”

      This amendment can be found on lines 443-446.

      (9) Also in the discussion: " Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enchanced [sic] survival to antibiotic treatment." This does not really relate to the results here because the bimodal distributions were primarily studied in the absence of growth. In the LB/exponential growth situations where the population was growing but a very small subpopulation of low accumulators was observed, no measurements were made to indicate subpopulation growth rates.

      We have now removed this statement from the manuscript.

      (10) In discussion, L-Ara4N appears to be referred to as both positively charged and negatively charged; this should be clarified.

      We have now clarified that L-Ara4N is positively charged.

      This amendment can be found on line 496.

      (11) Discussion of TF analysis seems to overstate what is supported by the evidence. The correlation of up- and downregulated genes with previously described TF regulons (probably measured in very different conditions) does not really demonstrate TF activity. This could be measured directly with additional experiments but in the absence of those experiments claims about detecting TF activity should probably be avoided. The attempts to directly demonstrate the importance of those transcription factors to the observed accumulation activity were not successful.

      We have now removed from the discussion the previous paragraph related to the TF analysis. We have also modified the results section reported the TF analysis as follows: “Next, we sought to infer transcription factor (TF) activities via differential expression of their known regulatory targets [61]. A total of 126 TFs were inferred to exhibit differential activity between low and high accumulators (Data Set S4). Among the top ten TFs displaying higher inferred activity in low accumulators compared to high accumulators, four regulate transport systems, i.e. Nac, EvgA, Cra, and NtrC (Figure S12). However, further experiments should be carried out to directly measure the activity of these TFs.”

      Finally, we have also moved the TFs’ data from Figure 3 to Figure S12 in the Supplementary information.

      These amendments can be found on lines 288-293.

      (12) When discussing the possibility of nutrient supplementation versus efflux inhibition as a potential therapeutic strategy, it could be noted that nutrient supplementation cannot be done in many infection contexts. The host immune system and host/bacterial cell density control nutrient access.

      We have now added the following statement: “Moreover, nutrient supplementation as a therapeutic strategy may not be viable in many infection contexts, as host density and the immune system often regulate access to nutrients [3]”.

      These amendments can be found on lines 553-555.

      Reviewer 2:

      (1) Some questions regarding the mechanism remain. One shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´cells. This makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern or if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we have now acknowledged that “tachyplesin-NBD has antibiotic efficacy (see Figure 2) and has an impact on the E. coli transcriptome (Figure 3). Therefore, we cannot conclude whether the transcriptomic differences reported between low and high accumulators of tachyplesin-NBD are causative for the distinct accumulation patterns or if they are a consequence of differential accumulation and downstream phenotypic effects.”

      These amendments can be found on lines 283-287.

      (2) It would be relevant to test and report the MIC of sertraline for the strain tested, particularly since in Figure 4G an initial reduction in CFUs is observed for sertraline treatment, which suggests the existence of biological effects in addition to efflux inhibition.

      We have now measured the MIC of sertraline against E. coli BW25113 finding the MIC value to be 128 μg mL<sup>-1</sup> (418 µM). This value is more than four times higher compared to the sertraline concentration employed in our study, i.e. 30 μg mL<sup>-1</sup> (98 μM).

      These amendments can be found on lines 389-391 and data has been added to Figure 4 – Source Data.

      (3) The role of efflux systems is further supported by the finding that efflux pump inhibitors sensitize E. coli to tachyplesin and prevent the occurrence of the tolerant ´low accumulator´ subpopulations. In principle, this is a great way of validating the role of efflux pumps, but the limited selectivity of these inhibitors (CCCP is an uncoupling agent, and for sertraline direct antimicrobial effects on E. coli have been reported by Bohnert et al.) leaves some ambiguity as to whether the synergistic effect is truly mediated via efflux pump inhibition. To strengthen the mechanistic angle of the work analysis of tachyplesin-NBD accumulation in mutants of the identified efflux components would be interesting.

      We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant (Figure 4B). Considering that the AcrAB-TolC tripartite RND efflux system is known to confer genetic resistance against AMPs like protamine and polymyxin-B [29,30] and that the quorum sensing regulators qseBC might control the expression of acrA [64] , these data further corroborate the hypothesis that low accumulators can efflux tachyplesin and survive treatment with this AMP.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14.

      Moreover, we have also carried out further efflux assays with both ethidium bromide and tachyplesin-NBD to further demonstrate the role of efflux in reduced accumulation of tachyplesin as well as acknowledging that other mechanisms (i.e reduced influx, increased protease activity or increased secretion of OMVs) could play an important role, please see Response 1 to Reviewer 1.

      (4) The authors imply that protease could contribute to the low accumulator mechanism. Proteases could certainly cleave and thus inactivate AMPs/tachyplesin, but would this effect really lead to a reduction in fluorescence levels since the fluorophore itself would not be affected by proteolytic cleavage?

      We agree with the reviewer that nitrobenzoxadiazole (NBD) might not be cleaved by proteases that inactivate tachyplesin and other AMPs. Therefore, inactivation of tachyplesin by proteases might not affect cellular fluorescence levels unless efflux of NBD is possible following the cleavage of tachyplesin-NBD. We have therefore removed the statement “Conversely, should efflux or proteolytic activities by proteases underpin the functioning of low accumulators, we should observe high initial tachyplesin-NBD fluorescence in the intracellular space of low accumulators followed by a decrease in fluorescence due to efflux or proteolytic degradation.” We have now stated the following: “Low accumulators displayed an upregulation of peptidases and proteases compared to high accumulators, suggesting a potential mechanism for degrading tachyplesin (Table S1 and Data Set S3).”

      These amendments can be found on lines 280-282.

      (5) To facilitate comparison with other literature (e.g. papers on sertraline) it would be helpful to state compound concentrations also as molar concentrations.

      We have now added the molar concentrations alongside all instances where concentrations are stated in μg mL<sup>-1</sup>.

      (6) The authors tested a series of efflux pump inhibitors and found that CCCP and sertraline prevented the generation of the low accumulator subpopulation, whereas other inhibitors did not. An overview and discussion of the known molecular targets and mode of action of the different selected inhibitors could reveal additional insights into the molecular mechanism underlying the synergy with tachyplesin.

      We have now added molecular targets and mode of action of the different inhibitors where known. “Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].” And “Interestingly, M9 containing 30 µg mL<sup>-1</sup> (98 μM) sertraline (Figure 4D and S15C), an antidepressant which inhibits efflux activity of RND pumps, potentially through direct binding to efflux pumps [65] and decreasing the PMF [66], or 50 µg mL<sup>-1</sup> (110 μM) verapamil (Figure S15D), a calcium channel blocker that inhibits MATE transporters [67] by a generally accepted mechanism of PMF generation interference [68,69], was able to prevent the emergence of low accumulators. Furthermore, tachyplesin-NBD cotreatment with sertraline simultaneously increased tachyplesin-NBD accumulation and PI fluorescence levels in individual cells (Figure 4E and F, p-value < 0.0001 and 0.05, respectively). The use of berberine, a natural isoquinoline alkaloid that inhibits MFS transporters [70] and RND pumps [71], potentially by inhibiting conformational changes required for efflux activity [70], and baicalein, a natural flavonoid compound that inhibits ABC [72] and MFS [73,74] transporters, potentially through PMF dissipation [75], prevented the formation of a bimodal distribution of tachyplesin accumulation, however displayed reduction in fluorescence of the whole population (Figure S15E and F). Phenylalanine-arginine beta-naphthylamide (PAbN), a synthetic peptidomimetic compound that inhibits RND pumps [76] through competitive inhibition [77], reserpine, an indole alkaloid that inhibits ABC and MFS transporters, and RND pumps [78], by altering the generation of the PMF [69], and 1-(1-naphthylmethyl)piperazine (NMP), a synthetic piperazine derivative that inhibits RND pumps [79], through non-competitive inhibition [80], did not prevent the emergence of low accumulators (Figure S15G-I).”

      These amendments can be found on lines 337-342 and 367-385.

      (7) Page 8. The term ´medium accumulators´ for a 1:1 mix of low and high accumulators is misleading.

      We have now replaced the term “medium accumulators” with “a 1:1 (v/v) mixture of low and high accumulators”.

      These amendments to the description can be found on lines 238-239.

      (8) Figure 3. It may be more appropriate to rephrase the title of the figure to ´biological processes associated with low tachyplesin accumulation´ (rather than ´facilitate accumulation´). The same applies to the section title on page 8.

      We have amended the title of Figure 3 as requested by the reviewer.

      (9) The fact that the low accumulation phenotype depends on the growth media and conditions and can be prevented by nutrients is highly relevant. I would encourage the authors to consider showing the corresponding data in the main manuscript rather than in the SI.

      We have created a new Figure 5, displaying the impact of the nutritional environment and bacterial growth phase on both tachyplesin-NBD accumulation and efficacy.

      (10) In the discussion the authors state´ Heterogeneous expression of efflux pumps within isogenic bacterial populations has been reported 29,32,33,67-69. However, recent reports have suggested that efflux is not the primary mechanism of antimicrobial resistance within stationary-phase bacteria 31,70.´. In light of the authors´ findings that the response to tachyplesin is induced by exposure and is not pre-selected, could they speculate on why this specific response can be induced in stationary, but not exponential cells? Could there be a combination of pre-existing traits and induced responses at play? Could e.g. the reduced growth rate/metabolism in these cells render these cells less susceptible to the intracellular effects of tachyplesin and slow down the antibiotic efficacy, giving the cells enough time to mount additional protective responses that then lead to the low accumulation phenotype?

      We have now acknowledged that it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.

      “As our accumulation assay did not require the prior selection for phenotypic variants, we have demonstrated that low accumulators emerge subsequent to the initial high accumulation of tachyplesin-NBD, suggesting enhanced efflux as an induced response. However, it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production, and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.”

      This amendment can be found on lines 482-489.

      (11) In the abstract: Is it true that low accumulators ´sequester´ the drug in their membrane? In my understanding ´sequestering´ would imply that low accumulators would bind higher levels of tachyplesin-NBD in their membrane compared to high accumulators (and thereby preventing it from entering the cells). According to Figure 1 J, K, it rather seems that the fluorescent signal around the membrane is also stronger in high accumulators.

      We have now removed the sentence “low accumulators sequester the drug in their membrane” from the abstract. We have instead stated: “These phenotypic variants display enhanced efflux activity to limit intracellular peptide accumulation.”

      These amendments can be found on lines 34-35.

      Reviewer 3:

      (1) The authors' claims about high efflux being the main mechanism of survival are unconvincing, given the current data. There can be several alternative hypotheses that could explain their results, such as lower binding of the AMP, lower rate of internalization, metabolic inactivity, etc. It is unclear how efflux can be important for survival against a peptide that the authors claim binds externally to the cell. The addition of efflux assays would be beneficial for clear interpretations. Given the current data, the authors' claims about efflux being the major mechanism in this resistance are unconvincing (in my humble opinion). Some direct evidence is necessary to confirm the involvement of efflux. The data with CCCP in Figure 4C can only indicate accumulation, not efflux. The authors are encouraged to perform direct efflux assays using known methods (e.g., PMIDs 20606071, 30981730, etc.). Figure 4A: The data does not support the broad claims about efflux. First, if the peptide is accumulated on the outside of the outer membrane, how will efflux help in survival? The dynamics shown in 4A may be due to lower binding, lower entry, or lower efflux. These mechanisms are not dissected here. Second, the heterogeneity can be preexisting or a result of the response to this stress. Either way, whether active efflux or dynamic transcriptomic changes are responsible for these patterns is not clear. Direct efflux assays are crucial to conclude that efflux is a major factor here.

      This important comment is similar in scope to the first comment of reviewer 1 and it is partly due to the fact that we had not clearly explained our efflux assays reported in Figure 4 in the original manuscript. We kindly refer this reviewer to our extensive response 1 to reviewer 1 and corresponding amendments on lines 316-350 and in the new Figure S13 and Figure 4 (reported in the response 1 to reviewer 1 above), where we have now fully addressed this reviewer’s and reviewer 1 concerns, as well as performing new experiments following their important suggestions and the methods described in PMIDs 20606071 suggested by this reviewer.

      (2) The fluorescent imaging experiments can be conducted in the presence of externally added proteases, such as proteinase K, which has multiple cleavage sites on tachyplesin. This would ensure that all the external peptides (both free and bound) are removed. If the signal is still present, it can be concluded that the peptide is present internally. If the peptide is primarily external, the authors need to explain how efflux could help with externally bound peptides. Figure 1J-K: How are the authors sure about the location of the intensity? The peptide can be inside or outside and still give the same signal. To prove that the peptide is inside or outside, a proteolytic cleavage experiment is necessary (proteinase K, Arg-C proteinase, clostripain, etc.).

      We thank the reviewer for this important suggestion.

      We have now performed experiments where stationary phase E. coli was incubated in 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD in M9 for 60 min. Next, cells were pelleted and washed to remove extracellular tachyplesin-NBD and then incubated in either M9 or 20 μg mL<sup>-1</sup> (0.7 μΜ) proteinase K in M9 for 120 min. We found that the fluorescence of low accumulators decreased over time in the presence of proteinase K; in contrast, the fluorescence of high accumulators did not decrease over time in the presence of proteinase K. These data therefore suggest that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      Moreover, confocal microscopy using tachyplesin-NBD along with the membrane dye FM™ 4-64FX further confirmed that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      These amendments can be found on lines 173-179, lines 188-192 and in the new Figures S4 and S6.

      (3) Further genetic experiments are necessary to test whether efflux genes are involved at all. The genetic data presented by the authors in Figure S11 is crucial and should be further extended. The problem with fitting this data to the current hypothesis is as follows: If specific efflux pumps are involved in the resistance mechanism, then single deletions would cause some changes to the resistance phenotype, and the data in Figure S11 would look different. If there is redundancy (as is the case in many efflux phenotypes), the authors may consider performing double deletions on the major RND regulators (for example, evgA and marA). Additionally, the deletion of pump components such as TolC (one of the few OM components) and adaptors (such as acrA/D) might also provide insights. If the peptide is present in the periplasm, then deletions involving outer components would become important.

      This important comment is similar in scope to the third comment of reviewer 2. We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14, please also see our response to comment 3 of reviewer 2.

      (4) Line numbers would have been really helpful. Please mention the size of the peptide (length and spatial) for readers.

      We have now added line numbers to the revised manuscript. The length and molecular weight of tachyplesin-1 have now been added on lines 75.

      (5) Figure S4 is unclear. How were the low accumulators collected? What prompted the low-temperature experiment? The conclusion that it accumulates at the outer membrane is unjustified. Where is the data for high accumulators?

      We have now corrected the results section to state that tachyplesin-NBD accumulates on the cell membranes, rather than at the outer membrane of E. coli cells.

      These amendments can be found on lines 178 and 190.

      We would like to clarify that in Figure S4 we compare the distribution of tachyplesin-NBD single-cell fluorescence at low temperature versus 37 °C across the whole stationary phase E. coli population, we did not collect low accumulators only.

      The low-temperature experiment was prompted by a previous publication paper (Zhou Y et al. 2015: doi: 10.1021/ac504880r. Epub 2015 Mar 24. PMID: 25753586) that showed non-specific adherence of antimicrobials to the bacterial surface occurs at low temperatures and that passive and active transport of antimicrobials across the membrane is significantly diminished. Additionally, there are previous reports that suggest low temperatures inhibit post-binding peptide-lipid interactions, but not the primary binding step (PMID: 16569868; PMCID: PMC1426969; PMID: 3891625; PMCID: PMC262080).

      Therefore, the low-temperature experiment was performed to quantify the fluorescence of cells due to non-specific binding. This quantification allowed us to deduce that fluorescence levels of high accumulators are above the measured non-specific binding fluorescence (measured in the low-temperature experiment for the whole stationary phase E. coli population) is the result of intracellular tachyplesin-NBD accumulation. In contrast, the comparable fluorescence levels between all the cells in the low-temperature experiment and the low accumulator subpopulation at 37 °C suggest that tachyplesin-NBD is predominantly accumulated on the cell membranes of low accumulators instead of intracellularly.

      Please also see our response to comment 2 above for further evidence supporting that tachyplesin-NBD accumulates only on the cell membranes of low accumulators and both on the cell membranes and intracellularly in low accumulators.

      (6) Figure S5: Describe the microfluidic setup briefly. Why did the distribution pattern change (compared to Figure 1A)? Now, there are more high accumulators. Does the peptide get equally distributed between daughter cells?

      We have now added a brief description of the microfluidic setup on lines 182-184.

      The difference in the abundance of low and high accumulators between the microfluidics and flow cytometry measurements is likely due to differences in cell density, i.e. a few cells per channel vs millions of cells in a tube. A second major difference is that tachyplesin-NBD is continuously supplied in the microfluidic device for the entire duration of the experiment, therefore, the extracellular concentration of tachyplesin-NBD does not decrease over time. In contrast, tachyplesin-NBD is added to the tube only at the beginning of the experiment, therefore, the extracellular concentration of tachyplesin-NBD likely decreases in time as it is accumulated by the bacteria. The relative abundance of low and high accumulators changes with the extracellular concentration of tachyplesin-NBD as shown in Figure 1A.

      We have added a sentence to acknowledge this discrepancy on lines 186-187.

      No instances of cell division were observed in stationary phase E. coli in the absence of nutrients in all microfluidics assays. Therefore, we cannot comment on the distribution of tachyplesin-NBD across daughter cells.

      (7) How did the authors conclude this: "tachyplesin accumulation on the bacterial membrane may not be sufficient for bacterial eradication"? It is completely unclear to this reviewer.

      We presented this hypothesis at the end of the section “Tachyplesin accumulates primarily in the membranes of low accumulators” as a link to the following section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication” where we test this hypothesis. For clarity, we have now moved this sentence to the beginning of the section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication”.

      (8) What is meant by membrane accumulation? Outside, inside, periplasm? Where? Figure 2H conclusions are unjustified. Bacterial killing with many antibiotics is associated with membrane damage, which is an aftereffect of direct antibiotic action. How can the authors state that "low accumulators primarily accumulate tachyplesin-NBD on the bacterial membrane, maintaining an intact membrane, strongly contributing to the survival of the bacterial population"? This reviewer could not find justifications for the claims about the location of the accumulation or cells actively maintaining an intact membrane. Also, PI staining reports damage both membranes.

      Based on the experiments that we have carried out after this reviewer’s suggestions, please see response 2 above, it is likely that tachyplesin-NBD is present only on the bacterial surface, i.e. in or on the outer membrane of low accumulators, considering that their fluorescence decreases during treatment with proteinase K. However, to take a more conservative approach we have now written on the cell membranes throughout the manuscript, i.e. either the outer or the inner membrane.

      We have also rephrased the statement reported by the reviewer as follows:

      “Taken together with PI staining data indicating membrane damage caused by high tachyplesin accumulation, these data demonstrate that low accumulators, which primarily accumulate tachyplesin-NBD on the bacterial membranes, maintain membrane integrity and strongly contribute to the survival of the bacterial population in response to tachyplesin treatment.”

      These amendments can be found on lines 228-232.

      (9) Figure 3: The findings about cluster 2 and cluster 4 genes do not correlate logically. If the cells are in a metabolically low active state, how are the cells getting enough energy for active efflux and membrane transport? This scenario is possible, but the authors must confirm the metabolic activity by measuring respiration rates. Also, metabolically less-active cells may import a lower number of peptides to begin with. That also may contribute to cell survival. Additionally, lowered metabolism is a known strategy of antibiotic survival that is distinctly different from efflux-mediated survival.

      Following this reviewer’s comment and comment 2 of reviewer 1, we have now carried out further experiments to estimate the metabolic activity of low and high accumulators. Please see our response to comment 2 of reviewer 1 above.

      (10) Figure S10: How did the authors test their hypothesis that cardiolipin is involved in the binding of the peptide to the membrane? The transcriptome data does not confirm it. Genetic experiments are necessary to confirm this claim.

      We would like to clarify that we have not set out to test the hypothesis that cardiolipin is involved in the binding of tachyplesin-NBD. We have only stated that cardiolipin could bind tachyplesin due to its negative charge. We have now cited two previous studies that suggest that tachyplesin has an increased affinity for lipids mixtures containing either cardiolipin (Edwards et al. ACS Inf Dis 2017) or PG lipids (Matsuzaki et al. BBA 1991), i.e. the main constituents of cardiolipins.

      These amendments can be found on lines 264-267.

      (11) Figure 4B-F: There are several controls missing. For Sertraline treatment, the authors must test that the metabolic profile, transcriptomic changes, or import of the peptide are not responsible for enhanced survival. CCCP will not only abolish efflux but also many other respiration-associated or all other energy-driven processes.

      Figure 4D presents data acquired in efflux assays in the absence of extracellular tachyplesin-NBD. Therefore, altered tachyplesin-NBD import cannot contribute to the lack of formation of the low accumulator subpopulation.

      We have now acknowledged that it is conceivable that increased tachyplesin efficacy is due to metabolic and transcriptomic changes induced by sertraline.

      These amendments can be found on lines 396-397.

      We have also acknowledged that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes.

      These amendments can be found on lines 341-342.

    1. Author response:

      eLife Assessment

      This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to four viral proteins that are of structural and functional importance. The justification of some hypotheses regarding fitness is incomplete, as well as the evidence for the model's predictive power, since it shows little improvement over neutral models in predicting protein evolution.

      We thank for the constructive comments that helped improve our study. Regarding the comment about justification of fitness, we will include in the revised manuscript additional information to support the relevance of modeling protein evolution accounting for protein folding stability. We agree that increasing the parameterization of the developed birth-death model is interesting, if it does not lead to overfitting. The model presented considers the fitness of protein variants to determine their reproductive success through the corresponding birth and death rates, varying among lineages, and it is biologically meaningful and technically correct (Harmon 2019). Following a suggestion of the first reviewer to allow variation of the global birth-death rate among lineages, we will additionally incorporate this aspect into the model and evaluate its performance with the data for the evaluation of the models. The integration of structurally constrained substitution models of protein evolution, as Markov models, into the birth-death process was made following standards approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012) and we will provide more information about it in the revised manuscript. Regarding the predictive power, our study showed good accuracy in predicting the real folding stability of forecasted protein variants. On the other hand, predicting the exact sequences proved to be more challenging, indicating needs in the field of substitution models of molecular evolution. Altogether, we believe our findings provide a significant contribution to the field, as accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Additionally, we implemented the models into a freely available computer framework, with detailed documentation and diverse practical examples.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.

      We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. However, predicting the exact sequences was more challenging. For example, amino acids with similar physicochemical properties can result in similar folding stability while differ in the specific sequence, more accurate substitution models of molecular evolution are required in the field. We consider that forecasting the folding stability of future real proteins is an important advancement in forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify this issue in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birth-death model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later for another model derived from the proposal of the reviewer and that we are now implementing into the framework and applying to the data used for the evaluation of the models), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this alters the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution, as Markov models, is correct following general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We will provide a more detailed description of the model in the revised manuscript.

      Apart from these clarifications about the birth-death model used, we understand the point of the reviewer and following the suggestion we are now incorporating an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we are following the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate varies among lineages. We are now implementing this model into the computer framework and applying it to the data used for the evaluation of the models. Preliminary results, which will be finally presented in the revised manuscript, indicate that this model yields similar predictive accuracy compared to the previous birth-death model. If this is confirmed, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We will present this additional birth-death model and its results in the revised manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).

      The study shows similar performance in predicting the sequences of the forecasted proteins under both the SCS model and the neutral model, but shows differences in predicting the folding stability of the forecasted proteins between these models. Indeed, as explained in the previous answer, the birth-death model accounts for variation in fitness among lineages, leading to differences among lineages in reproductive success. The new birth-death model that we are now implementing, which incorporates variation of the global birth-death rate among lineages, is producing similar preliminary results. In addition to these considerations, it is known that SCS models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability. However, inferring sequences (i.e., ancestral sequences) is considerably more challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much greater than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions among amino acids with similar physicochemical properties can result in protein variants with similar folding stability but different specific amino acid sequences; further work is demanded in the field of substitution models of molecular evolution. We will expand the discussion of this aspect in the revised manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.

      In the present study, we compare the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitutions over time. Therefore, to compare the neutral and SCS models, an evolutionary time is required, in this case it is provided by the birth-death process. The suggestions 1) and 2) cannot be compared without an underlined evolutionary history. However, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in our previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models produced proteins with more realistic folding stability than models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results from the present study where we explore the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant and novel finding, folding stability is fundamental to protein function and has diverse implications. While accurately forecasting the exact sequences would indeed be ideal, this remains a challenging task with current substitution models. In this regard, we will discuss in the revised manuscript the need of developing more accurate substitution models.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.

      This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is provided as an input file and it can be updated to incorporate new structures (see the framework documentation and the practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins to reduce biases), thus incorporating background molecular diversity. This important feature was not sufficiently described in the manuscript, and we will add more details in the revised version. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may impact the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We will include a discussion in the revised manuscript about our perspective on the potential effects of environmental changes on forecasting evolution.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of protein (Goldstein 2013), making it broadly applicable. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birth-death models. Rather, we aim to explore the integration of a standard birth-death model with structurally constrained substitution models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and their combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this biological system. We will include these considerations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and co-authors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We also thank this reviewer for the positive comments on our study. Regarding the predictive power, our results showed good accuracy in predicting the folding stability of the forecasted protein variants. However, predicting the specific sequences of these variants is more challenging. For example, forecasting in amino acids with similar physicochemical properties can result in different sequences but in similar folding stability. We believe that these findings are realistic and interesting as they indicate that while forecasting folding stability is feasible, forecasting the specific sequence evolution is more complex that one could anticipate.

      Strengths:

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses:

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.

      It is known that structurally constrained substitution (SCS) models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability, while inferring sequences (i.e., ancestral sequences) remains considerably more challenging (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much higher than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can result in protein variants with similar folding stability but with different specific amino acid composition. We will expand the discussion of this aspect in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding forecasted variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune response. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic divergence between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate forecasting evolution. We will include these considerations in the revised manuscript.

      Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). We will provide additional information on this aspect in the manuscript.

      Regarding the Omicron dataset, we used 384 curated sequences of the Omicron variant of concern to construct the study dataset and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other timepoints (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. We noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID.

      Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations.

      Next, following the proposal of the reviewer, we will incorporate the analysis of an additional viral dataset (probably influenza following the suggestion of the reviewer) to further assess the generalizability of the method. Still, as previously indicated, not all datasets are suitable for a proper evaluation of forecasting evolution. Factors such as the shape of the fitness landscape and the amount of genetic variation over time can influence the accuracy of predictions. We will present the results of the analysis of the new data in the revised manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study is not focused on investigating the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which is an important evaluation of the method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We will include additional details about the parameters of the homology modeling in the revised version. Indeed, our method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur, and in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We will include this discussion in the revised manuscript.

      Cited references

      Arenas M. 2012. Simulation of Molecular Data under Diverse Evolutionary Scenarios. PLoS Comput Biol 8:e1002495.

      Arenas M, Bastolla U. 2020. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 11:248-257.

      Arenas M, Dos Santos HG, Posada D, Bastolla U. 2013. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020-3028.

      Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. 2016. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution 94:264-270.

      Arenas M, Sanchez-Cobos A, Bastolla U. 2015. Maximum likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution 32:2195-2207.

      Arenas M, Weber CC, Liberles DA, Bastolla U. 2017. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Systematic Biology 66:1054-1064.

      Bordner AJ, Mittelmann HD. 2013. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution 31:736-749.

      Carvajal-Rodriguez A. 2010. Simulation of genes and genomes forward in time. Current Genomics 11:58-61.

      Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17:109-121.

      Echave J, Wilke CO. 2017. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 46:85-103.

      Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. 2022. The evolution of the HIV-1 protease folding stability. Virus Evol 8:veac115.

      Goldstein RA. 2013. Population Size Dependence of Fitness Effect Distribution and Substitution Rate Probed by Biophysical Model of Protein Thermostability. Genome Biol Evol 5:1584-1593.

      Harmon LJ. 2019. Introduction to birth-death models. In. Phylogenetic Comparative Methods. p. https://lukejharmon.github.io/pcm/chapter10_birthdeath/.

      Hoban S, Bertorelle G, Gaggiotti OE. 2012. Computer simulations: tools for population and evolutionary genetics. Nature Reviews Genetics 13:110-122.

      Illergard K, Ardell DH, Elofsson A. 2009. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77:499-508.

      Lässig M, Mustonen V, Walczak AM. 2017. Predicting evolution. Nature Ecology & Evolution 1:0077.

      Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, et al. 2012. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science 21:769-785.

      Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees. Elife 3.

      Olabode AS, Kandathil SM, Lovell SC, Robertson DL. 2017. Adaptive HIV-1 evolutionary trajectories are constrained by protein stability. Virus Evol 3:vex019.

      Pascual-Garcia A, Abia D, Mendez R, Nido GS, Bastolla U. 2010. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 78:181-196.

      Wilke CO. 2012. Bringing molecules back into molecular evolution. PLoS Comput Biol 8:e1002572.

      Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, et al. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265-269.

      Yang Z. 2006. Computational Molecular Evolution. Oxford, England.: Oxford University Press.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors have developed SPLASH+, a micro-assembly and biological interpretation framework that expands on their previously published reference-free statistical approach (SPLASH) for sequencing data analysis.

      Thank you for this thorough overview of our work.

      Strengths:

      (1) The methodology developed by the authors seems like a promising approach to overcome many of the challenges posed by reference-based single-cell RNA-seq analysis methods.

      Thank you for your positive comment on the potential of our approach to address the limitations of reference-based methods for scRNA-Seq analysis.

      (2) The analysis of the RNU6 repetitive small nuclear RNA provides a very compelling example of a type of transcript that is very challenging to analyze with standard reference-based methods (e.g., most reads from this gene fail to align with STAR, if I understood the result correctly).

      We thank the reviewer for their positive comment. We agree that the variation in RNU6 detected by SPLASH+ underscores the potential of our reference-free method to make discoveries in cases where reference-based approaches fall short.

      Weaknesses:

      (1) The manuscript presents a number of case studies from very diverse domains of single-cell RNA-seq analysis. As a result, the manuscript has been challenging to review, because it requires domain expertise in centromere biology, RNA splicing, RNA editing, V(D)J transcript diversity, and repeat polymorphisms.

      We appreciate the reviewer’s effort in thoroughly evaluating this manuscript, especially given the broad range of biological domains discussed. Our main goal in presenting a wide range of applications was to highlight the key strength of the SPLASH+ framework: its ability to unify diverse biological discoveries within a single method that operates directly on sequencing reads.

      (2) Although the paper focuses on SmartSeq2 full-length single-cell RNA-seq data analysis, the vast majority of single-cell RNA-seq data that is currently being generated comes from droplet-based methods (e.g., 10x Genomics) that sequence only the 3' or 5' ends of transcripts. As a result, it is unclear if SPLASH+ is also applicable to these types of data.

      We thank the reviewer for this comment. Due to the specific data format of barcoded single-cell sequencing platforms such as 10x Genomics, extending the SPLASH framework to support 10x analysis required engineering a specialized preprocessing tool. We have addressed this in a recent work, which is now available as a preprint (https://doi.org/10.1101/2024.12.24.630263).

      (3) The criteria used for the selection of the 10 'core genes' have not been sufficiently justified.

      We chose these genes as SPLASH+ detected regulated splicing for them in nearly all tissues (18 out of 19)  analyzed in our study (i.e., identifying anchors classified as splicing anchors in those tissues). Our subsequent analysis showed that all these genes are involved in either splicing regulation or histone modification. We will further clarify this selection criterion in the revision. 

      (4) It is currently unclear how the splicing diversity discovered in this paper relates to the concept of noisy splicing (i.e., there are likely many low-frequency transcripts and splice junctions that are unlikely to have a significant functional impact beyond triggering nonsense-mediated decay).

      In our analysis, to ensure sufficient read coverage, we considered significant anchors supported by more than 50 reads and detected in over 10 cells. Additionally, our downstream analyses (including splicing analysis) are based on assembled sequences (compactors) generated through our micro-assembly step. This process effectively acts as a denoising step by filtering out sequences likely caused by sequencing errors or with very low read support. However, we agree that the detected splice variants have not been fully functionally characterized, and further functional experiments may be needed.

      (5) The paper presents only a very superficial discussion of the potential weaknesses of the SPLASH+ method.

      We discussed two potential limitations of SPLASH+ in the Conclusions section: (1) it is not suitable for differential gene expression analysis, and (2) although we provide a framework for interpreting and analyzing SPLASH results, further work is still needed to improve the annotation of calls lacking BLAST matches. We will add more discussion for these in the revision. 

      (6) The cursory mention of metatranscriptome in the conclusion of the paper is confusing, as it might suggest the presence of microbial cells in sterile human tissues (which has recently been discredited in cancer, see e.g. https://www.science.org/content/article/journal-retracts-influential-cancer-microbiome-paper).

      We will remove the mention of metatranscriptome in the revised manuscript.

      Reviewer #2 (Public review):

      The authors extend their SPLASH framework with single-cell RNA-seq in mind, in two ways. First, they introduce "compactors", which are possible paths branching out from an anchor. Second, they introduce a workflow to classify compactors according to the type of biological sequence variation represented (splicing, SNV, etc). They focus on simulated data for fusion detection, and then focus on analyzing the Tabula sapiens Smart-seq2 data, showing extensive results on alternative splicing analysis, VDJ, and repeat elements.

      This is strong work with an impressive array of biological investigations and results for a methods paper. I have various concerns about terminology and comparisons, as follows (in a somewhat arbitrary order, apologies).

      Thank you for this thorough overview of our work and your positive comment on the strength of our work.

      (1) The discussion of the weaknesses of the consensus sequence approach of SPLASH is an odd way to motivate SPLASH+ in my opinion, in that SPLASH is not yet so widely used, so the baseline for SPLASH+ is really standard alignment-based approaches. It is fine to mention consensus sequence issues briefly, but it felt belabored.

      We thank the reviewer and agree that the primary comparison for SPLASH+ is with reference-based methods. However, since SPLASH+ builds upon SPLASH, we also aimed to highlight the limitations of the consensus step in original SPLASH and how SPLASH+ addresses them. To maintain the main focus of the paper on comparison with reference-based methods and biological investigations, this discussion with consensus was provided in a Supplementary Figure. We will shorten this discussion in the revision.

      (2) Regarding compactors reducing alignment cost: the comparison should really be between compactor construction and alignment vs read alignment (and maybe vs modern contig construction algorithms and alignment).

      Since the SPLASH framework is fundamentally reference-free and does not require read alignment, we compared the number of sequence alignments for compactors to the total read alignments required by a reference-based method to show that while compactors are aligned to the reference, the number of alignments needed is still orders of magnitude less than a reference-based approach requiring alignment of all the reads.

      (3) The language around "compactors" is a bit confusing, where the authors sometimes refer to the tree of possibilities from an anchor as a "compactor", and sometimes a compactor is a single branch. Presumably, ideally, compactors should be DAGs, not trees, i.e., they can connect back together. Perhaps the authors could comment on whether this matters/would be a valuable extension.

      We thank the reviewer for their comment. We refer to each generated assembled sequence as “a compactor”, and we attempted to make this clear in the paper. We will review the text further to ensure this definition is clear in the revised version.

      (4) The main oddness of the splicing analysis to me is not using cell-type/state in any way in the statistical testing. This need not be discrete cell types: psiX, for example, tested whether exonic PSI was variable with reference to a continuous gene expression embedding. Intuitively, such transcriptome-wide signal should be valuable for a) improving power and b) distinguishing cell-type intrinsic/"noisy" from cell-type specific splicing variation. A straightforward way of doing this would be pseudobulking cell types. Possibly a more sophisticated hierarchical model could be constructed also.

      We appreciate the reviewer’s concern regarding SPLASH+ not using cell type metadata. SPLASH, which performs the core statistical inference in SPLASH+, is an unsupervised tool specifically designed to make biological discoveries without relying on metadata (such as cell type annotations in scRNA-Seq). This is particularly useful in scRNA-seq, where cell type labels could be missing, imprecise, or may miss important within-cell-type variation. As shown in the paper, even without using metadata, SPLASH+ demonstrated improved performance than both SpliZ and Leafcutter (two metadata-dependent tools) in terms of achieving higher concordance and identifying more differentially spliced genes. Regarding pseudobulking, as has been shown in the SpliZ paper (https://doi.org/10.1038/s41592-022-01400-x), pseudobulking requires multiple pseudobulked replicates per cell type for reliable inference, which is often not feasible in scRNA-seq settings, making such methods statistically suboptimal for single-cell studies. We will add a discussion on pseudobulking in the revision. 

      (5) A secondary weakness is that some informative reads will not be used, for example, unspliced reads aligning to an alterantive exons. This relates to the broader weakness of SPLASH that it is blind to changes in coverage that are not linked to a specific anchor (which should be acknowledged somewhere, maybe in the Discussion). In the deeply sequenced SS2 data, this is likely not an issue, but might be more limiting in sparser data. A related issue is that coverage change indicative of, e.g., alternative TSS or TES (that do not also include a change in splice junction use) will not be detected. In fairness, all these weaknesses are shared by LeafCutter. It would be valuable to have a comparison to a more "traditional" splicing analysis approach (pick your favorite of rMATS, MISO, SUPPA).

      We thank the reviewer for their comment. As noted in the Conclusion, the SPLASH framework is not designed for differential gene expression analysis, which relies on quantifying read coverage. Rather, it focuses on detecting differential sequence diversity arising from mechanisms like alternative splicing or RNA editing. We will clarify this limitation further in the revised Conclusion. 

      Regarding splicing evaluation, we have performed extensive comparisons with two widely used and recent methods—SpliZ and Leafcutter—for both bulk and single-cell splicing analysis. While we appreciate the reviewer’s suggestion to include an additional method, given the current length of the paper and the fact that leafcutter has previously been shown to outperform rMATS, MAJIQ, and Cufflinks2

      (https://www.nature.com/articles/s41588-017-0004-9), we believe the current comparisons provide sufficient support for the evaluation of the splicing detection by SPLASH+.

      (6) "We should note that there is no difference between gene fusions and other RNA variants (e.g., RNA splicing) from a sequence assembly viewpoint". Maybe this is true in an abstract sense, but I don't think it is in reality. AS can produce hundreds of isoforms from the same gene, and be variable across individual cells. Gene fusions are generally less numerous/varied and will be shared across clonal populations, so the complexity is lower. That simplicity is balanced against the challenge that any genes could, in principle, fuse.

      We selected the fusion benchmarking dataset solely to evaluate how well compactors reconstruct sequences. Since our goal was to assess the accuracy of reconstructed compactor sequences, we needed a benchmarking dataset with ground truth sequences, which this dataset provides. We had explained our main reason and purpose for selecting fusion dataset in the text, but we will clarify it further in the revision.

      (7) For the fusion detection assessment, SPLASH+ is given the correct anchor for detection. This feels like cheating since this information wouldn't usually be available. Can the authors motivate this? Are the other methods given comparable information? Also, TPM>100 seems like a very high expression threshold for the assessment.

      We agree with the reviewer that the fusion benchmarking dataset should not be used to assess the entire SPLASH+ framework. In fact, we did not use this dataset to evaluate SPLASH+; it was used exclusively to evaluate the performance of compactors as a standalone module. Specifically, we tested how well compactors can reconstruct fusion sequences when provided with seed sequences corresponding to fusion junctions. This aligns with our expectation from compactors in SPLASH+, that they should correctly reconstruct the sequence context for the detected anchors. As noted in our previous response, since our goal was to assess the accuracy of reconstructed compactor sequences, we required a benchmarking dataset with ground truth sequences, which this dataset provides. We will clarify this further in the revision.

      We appreciate the reviewer’s concern that a TPM of 100 is high. In Figure 1C, we presented the full TPM distribution for fusions missed or detected by compactors. The 100 threshold was an arbitrary benchmark to illustrate the clear difference in TPM profiles between these two sets of fusions. We will clarify this point in the revised manuscript.

      (8) Why are only 3'UTRs considered and not 5'? Is this because the analysis is asymmetric, i.e., only considering upstream anchors and downstream variation? If so, that seems like a limitation: how much additional variation would you find if including the other direction?

      We thank the reviewer for their comment. SPLASH+ can, in principle, detect variation in 5’ UTR regions, as demonstrated by the variations observed in the 5’ UTRs of the genes ANPC16 and ARPC2. If sequence variation exists in the 5′ UTR, SPLASH+ can still detect it by identifying an anchor upstream of the variable region, as it directly parses sequencing reads to find anchors with downstream sequence diversity. Even when the variation occurs near the 5′ end of the 5′ UTR, SPLASH+ can still capture this diversity if the user selects a shorter anchor length.

      (9) I don't find the theoretical results very meaningful. Assuming independent reads (equivalently binomial counts) has been repeatedly shown to be a poor assumption in sequencing data, likely due to various biases, including PCR. This has motivated the use of overdispersed distributions such as the negative Binomial and beta binomial. The theory would be valuable if it could say something at a specified level of overdispersion. If not, the caveat of assuming no overdispersion should be clearly stated.

      We appreciate the reviewer’s comment. We will clarify this in the revised paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using high-quality genomic data (long-reads, optical maps, short-reads) and advanced bioinformatic analysis, the authors aimed to document chromosomal rearrangements across a recent radiation (Lake Malawi Cichlids). Working on 11 species, they achieved a high-resolution inversion detection and then investigated how inversions are distributed within populations (using a complementary dataset of short-reads), associated with sex, and shared or fixed among lineages. The history and ancestry of the inversions is also explored.

      On one hand, I am very enthusiastic about the global finding (many inversions well-characterized in a highly diverse group!) and impressed by the amount of work put into this study. On the other hand, I have struggled so much to read the manuscript that I am unsure about how much the data supports some claims. I'm afraid most readers may feel the same and really need a deep reorganisation of the text, figures, and tables. I reckon this is difficult given the complexity brought by different inversions/different species/different datasets but it is highly needed to make this study accessible.

      The methods of comparing optical maps, and looking at inversions at macro-evolutionary scales can be useful for the community. For cichlids, it is a first assessment that will allow further tests about the role of inversions in speciation and ecological specialisation. However, the current version of the manuscript is hardly accessible to non-specialists and the methods are not fully reproducible.

      Strengths:

      (1) Evidence for the presence of inversion is well-supported by optical mapping (very nice analysis and figure!).

      (2) The link between sex determination and inversion in chr 10 in one species is very clearly demonstrated by the proportion in each sex and additional crosses. This section is also the easiest to read in the manuscript and I recommend trying to rewrite other result sections in the same way.

      (3) A new high-quality reference genome is provided for Metriaclima zebra (and possibly other assemblies? - unclear).

      (4) The sample size is great (31 individuals with optical maps if I understand well?).

      (5) Ancestry at those inversions is explored with outgroups.

      (6) Polymorphism for all inversions is quantified using a complementary dataset.

      Weaknesses:

      (1) Lack of clarity in the paper: As it currently reads, it is very hard to follow the different species, ecotypes, samples, inversions, etc. It would be useful to provide a phylogeny explicitly positioning the samples used for assembly and the habitat preference. Then the text would benefit from being organised either by variant or by subgroups rather than by successive steps of analysis.

      We have extensively rewritten the paper to improve the clarity. With respect to this point, we moved Figure 6 to Figure 1, which places the phylogeny of Lake Malawi cichlids at the beginning of the paper. We incorporated information about samples/technologies by ecogroup into this figure to help the reader gain an overview of the technologies involved. We added information about habitat for each ecogroup as well. While we considered a change to the text organization suggested here, we thought it was clearer to keep the original headings.

      (2) Lack of information for reproducibility: I couldn't find clearly the filters and parameters used for the different genomic analyses for example. This is just one example and I think the methods need to be re-worked to be reproducible. Including the codes inside the methods makes it hard to follow, so why not put the scripts in an indexed repository?

      We now provide a link to a github repository (https://github.com/ptmcgrat/CichlidSRSequencing/tree/Kumar_eLife) containing the scripts used for the major analysis in the paper. Because our data is behind a secure Dropbox account, readers will not be able to run the analysis, however, they can see the exact programs, filters, and parameters used for manuscript embedded within each script.

      (3) Further confirmation of inversions and their breakpoints would be valuable. I don't understand why the long-reads (that were available and used for genome assembly) were not also used for SV detection and breakpoint refinement.

      We did use long reads to confirm the presence of the inversions by creating five new genome assemblies from the PacBio HiFi reads: two additional Metriaclima zebra samples and three Aulonocara samples. Alignment of these five genomes to the MZ_GT3 reference is shown in Figures S2 – S7. These genome assemblies were also used to identify the breakpoints of the inversions. However, because of the extensive amount of repetitive DNA at the breakpoints (which is known to be important for the formation of large inversions), our ability to resolve the breakpoints was limited.

      (4) Lack of statistical testing for the hypothesis of introgression: Although cichlids are known for high levels of hybridization, inversions can also remain balanced for a long time. what could allow us to differentiate introgression from incomplete lineage sorting?

      The coalescent time between the inversions between Diplotaxodons and benthics should allow us to distinguish these two mechanisms. Our finding that the genetic distance, which is related to coalescent time, is closer within the inversions than the whole genome is supportive of introgression. However, we did not perform any simulations or statistical tests. We make it clearer in the text that incomplete lineage sorting remains a possible mechanism for the distribution of inversions within these ecogroups.

      (5) The sample size is unclear: possibly 31 for Bionano, 297 for short-reads, how many for long-reads or assemblies? How is this sample size split across species? This would deserve a table.

      We have included this information in the new Figure 1.

      (6) Short read combines several datasets but batch effect is not tested.

      We do not test for batch effect. However, we do note that all of the datasets were analyzed by the same pipeline starting from alignment so batch effects would be restricted to aspects of the reads themselves. Additionally, samples from the different data sets clustered as expected by lineage and inferred inversion, so for these purposes unlikely to have affected analysis.

      (7) It is unclear how ancestry is determined because the synteny with outgroups is not shown.

      Ancestry analysis was determined using the genome alignments of two outgroups from outside of Lake Malawi. This is shown in Figure S8.

      (8) The level of polymorphism for the different inversions is difficult to interpret because it is unclear whether replicated are different species within an eco-group or different individuals from the same species. How could it be that homozygous references are so spread across the PCA? I guess the species-specific polymorphism is stronger than the ancestral order but in such a case, wouldn't it be worth re-doing the PCa on a subset?

      The genomic PCA plots reflect the evolutionary histories that are observed in the whole genome phylogenies. Because the distribution of the inverted alleles violate the species tree, they form separate clusters on the PCA plots that can be used to genotype specific species. We have also performed this analysis on benthics (utaka/shallow benthics/deep benthics) and the distribution matches the expectation.

      Reviewer #2 (Public review):

      Summary:

      Chromosomal inversions have been predicted to play a role in adaptive evolution and speciation because of their ability to "lock" together adaptive alleles in genomic regions of low recombination. In this study, the authors use a combination of cutting-edge genomic methods, including BioNano and PacBio HiFi sequencing, to identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids, a classic example of adaptive radiation and rapid speciation. By examining the frequencies of these inversions present in species from six different linages, the authors show that there is an association between the presence of specific inversions with specific lineages/habitats. Using a combination of phylogenetic analyses and sequencing data, they demonstrate that three of the inversions have been introduced to one lineage via hybridization. Finally, genotyping of wild individuals as well as laboratory crosses suggests that three inversions are associated with XY sex determination systems in a subset of species. The data add to a growing number of systems in which inversions have been associated with adaptation to divergent environments. However, like most of the other recent studies in the field, this study does not go beyond describing the presence of the inversions to demonstrate that the inversions are under sexual or natural selection or that they contribute to adaptation or speciation in this system.

      Strengths:

      All analyses are very well done, and the conclusions about the presence of the six inversions in Lake Malawi cichlids, the frequencies of the inversions in different species, and the presence of three inversions in the benthic lineages due to hybridization are well-supported. Genotyping of 48 individuals resulting from laboratory crosses provides strong support that the chromosome 10 inversion is associated with a sex-determination locus.

      Weaknesses:

      The evidence supporting a role for the chromosome 11 inversion and the chromosome 9 inversion in sex determination is based on relatively few individuals and therefore remains suggestive. The authors are mostly cautious in their interpretations of the data. However, there are a few places where they state that the inversions are favored by selection, but they provide no evidence that this is the case and there is no consideration of alternative hypotheses (i.e. that the inversions might have been fixed via drift).

      We have removed mention of chromosome 9’s potential role in sex determination from the paper. While our analysis of sex association with chromosome 11 was limited compared to our analysis of chromosome 10, it was still statistically significant, and we believe it should be left in the paper. The role of 11 (and 9 and 10) in sex determination was also demonstrated using an independent dataset by Blumer et al (https://doi.org/10.1101/2024.07.28.605452)

      We agree that we did not properly consider alternative hypothesis in the original submission and have rewritten the Discussion substantially to consider various alternative hypothesis.

      Reviewer #3 (Public review):

      This is a very interesting paper bringing truly fascinating insight into the genomic processes underlying the famous adaptive radiation seen in cichlid fishes from Lake Malawi. The authors use structural and sequence information from species belonging to distinct ecotypic categories, representing subclades of the radiation, to document structural variation across the evolutionary tree, infer introgression of inversions among branches of the clade, and even suggest that certain rearrangements constitute new sex-determining loci. The insight is intriguing and is likely to make a substantial contribution to the field and to seed new hypotheses about the ecological processes and adaptive traits involved in this radiation.

      I think the paper could be clarified in its prose, and that the discussion could be more informative regarding the putative roles of the inversions in adaptation to each ecotypic niche. Identifying key, large inversions shared in various ways across the different taxa is really a great step forward. However, the population genomics analysis requires further work to describe and decipher in a more systematic way the evolutionary forces at play and their consequences on the various inversions identified.

      The model of evolution involving multiple inversions putatively linking together co-adapted "cassettes" could be better spelled out since it is not entirely clear how the existing theory on the recruitment of inversions in local adaptation (e.g. Kirkpatrick and Barton) operates on multiple unlinked inversions. How such loci correspond to distinct suites of integrated traits, or not, is not very easy to envision in the current state of the manuscript.

      This is a very interesting point, and we agree creates complications for a simple model of local adaptation. We imagine though that the actual evolutionary history was much more complicated than a single Rhamphochromis-type species separating from a single Diplotaxodon-type species and could have occurred sequentially involving multiple species that are now extinct. A better understanding of the role each of these inversions play in phenotypic diversity could potentially help us determine if different inversions carry variation that could be linked to distinct habit differences. We have added a line to the discussion.

      The role of one inversion in sex determination is apparent and truly intriguing. However, the implication of such locus on ecological adaptation is somewhat puzzling. Also, whether sex determination loci can flow across species via introgression seems quite important as a route to chromosomal sex determination, so this could be discussed further.

      Another very interesting point. If the inversions are involved in ecological adaptation (an important caveat), then potentially the inverted and non-inverted haplotypes play dual roles in the Aulonocara animals with the inverted haplotype carrying adaptive alleles to deep water and the non-inverted haplotype carrying alleles resolving sexual conflict. We have broadened our discussion about their function at the origin including non-adaptive roles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Overall, the paper is well-written and clear. I do have a few suggestions for changes that would help the reader:

      (1) Figure 1: the figure legend could be expanded here to help the reader; what are the blue and yellow lines? Why are there two lines for the GT3a assembly? And, I had to somehow read the legend a few times to understand that the top line is the UMD2a reference assembly, and the next line is the new Bionano map.

      Fixed in what is now Figure 2

      (2) Paragraph starting on line 133: you use the word "test" to refer to the Bionano analyses; it is not clear whether anything is being tested. Perhaps "analyse the maps" or just "map" would be more clear? Or more explanation?

      The text has been modified to address this point

      (3) L145-146: perhaps change "a single inversion" and "a double inversion" to "single inversions" and "double inversions".

      The text has been modified to address this point

      (4) L157: suppression of recombination in inversion heterozygotes is "textbook" material and perhaps does not need a reference. Or, you could reference an empirical paper that demonstrates this point. Though I love the Kirkpatrick and Barton paper, it certainly is not the correct reference for this point.

      The Kirkpatrick reference was incorrectly included here. The correct reference was an empirical demonstration (Conte) that there were regions of suppressed recombination that have been observed in the location of the inversions. We have also moved this reference further up in the sentence to a more appropriate position

      (5) L173: how do you know this is an assembly error and not polymorphism?

      The text has been modified to address this point

      (6) L277(?): "currently growing in the lab" is probably unnecessary.

      The text has been modified to address this point

      (7) L298: "the inversion on 10 acts as an XY sex determiner": the inversion itself is not the sex determination gene; rather, it is linked. I think it would be more precise, here and throughout the paper, to say that these inversions likely harbor the sex determination locus (for example, the wording on lines 369-370 is misleading).

      We agree with the larger point that the inversion might not be causal for sex determination, however, it could still be causal through positional effects. We have modified the text to make it clear that it could also carry the causal locus (or loci).

      (8) Figure 6: overall, this figure is very helpful! However, it contains several problematic statements. In no case do you have evidence that these inversions are "favored by selection"; such statements should be deleted. Also, in point 3, you state that inversions 9, 11, and 20 are transferred to benthic lineages, and then that these inversions are involved in sex determination. But, your data suggests that it is chromosomes 9, 10, and 11 that are linked to sex determination.

      This figure is now Figure 1. We have remove these problematic statements.

      (9) L356-360: I would move the references that are currently at the end of the sentence to line 357 after the statement about the previous work on hybridization. Otherwise, it reads as if these previous papers demonstrated what you have demonstrated in your work.

      The text has been modified to address this point

      (10) Overall, the discussion focuses completely on adaptive explanations for your results, and I would like to see at least an acknowledgement that drift could also be involved unless you have additional data to support adaptive explanations.

      We have rewritten the text to account for the possibility of drift (line 404 and 405).

      Reviewer #3 (Recommendations for the authors):

      The paper utilizes heterogeneous datasets coming from different sources, and it is not always clear which specimens were used to generate structural information (bionano) or sequence information. A diagram summarizing the sequence data, methodologies, and research questions would be beneficial for the reader to navigate in this paper.

      Much of this information has been added to what is now Figure 1. All of this data is also found in Table S2.

      The authors performed genome alignments to analyze and homologize inversion, but this process is not clearly described. For the PCA, SNP information likely involves mapping onto a common reference genome. However, it is not clear how this was achieved given the different species and varying divergence times involved.

      We now include a link to the github that contains the commands that were run. Because the overall level of sequence divergence between cichlid species is quite low (2*10^-3 – Milansky et al), mapping different species onto a common reference is commonly performed in Lake Malawi cichlids.

      The introgression scenario is very intriguing but its role in local adaptation of the ecogroup types is not easy to understand. I understand this is still an outstanding question, but it is unclear how the directionality of introgressions was estimated. This can be substantiated using tree topology analysis, comparative estimates of sequence divergence, and accumulation of DNA insertions. The diagram does not clearly indicate which ones are polymorphic. In some cases, polymorphic inversions could result from the coexistence of native and introgressed haplotypes.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The alternative model of introgression proposed in the cited preprint is interesting and should deserve a formal analysis here. The authors consider unclear what would drive "back" introgressions of non-inverted haplotypes, but this would depend on the selection regimes acting on the inversions themselves, which can include forms of balancing selection and a role for recessive lethals (heterozygote advantage). For instance, a standard haplotype could be favored if it shelters deleterious mutations carried by an inversion. Testing the introgression history over a wider range of branches and directions would provide further insights.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The prose in the paper is occasionally muddled and somewhat unclear. Referring to chromosomes solely by their numbers (e.g.. "inversion on 11") complicates readability.

      This is the standard way to refer to chromosomes in cichlids and we believe while it complicates readability, any other method would be inconsistent with other papers. Changes to nomenclature might improve the readability of this paper, but would make it more difficult to compare results for these chromosomes from other papers with what we have found.

    1. Billionaires

      I believe that harassment is never justified. Harassment involves actions like online insults, cyberstalking, and invasion of personal information to harm a user. While some people may think that harassment is acceptable when directed at extremists such as racists, white supremacists, or sexists. While I strongly disagree, there are clearly better ways to address such issues than resorting to harassment. For example, we can use facts and logic to refute their views instead of launching personal attacks, or report their behavior through legal and official channels.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      Comments on latest version:

      The corrected version of the article titled “Ultrastructural sublaminar specific diversity of excitatory synaptic boutons in layer 1 of the adult human temporal lobe neocortex" has been improved thanks to the comments and suggestions of the reviewers. The Authors implemented several of my comments and suggestions. However, many of them were not completed. It is understandable that the Authors did not start a whole new series of experiments investigating inhibitory synapses (as it was a misunderstanding affecting 2 reviewers from the three). But the English text is still very hard to understand and has many mistakes, although I suggested to extensively review the use of English. Furthermore, my suggestion about avoiding many abbreviations in the abstract, analyse and discuss more the perforated synapses, the figure presentation (Figure 3) and including data about the astrocytic coverage in the Results section were not implemented. My questions about the number of docked vesicles and p10 vesicles, as well as about the different categories of the vesicle pools have not been answered neither. Many other minor comments and suggestions were answered, corrected and implemented, but I think it could have been improved more if the Authors take into account all of the reviewers' suggestions, not only some of them. I still have several main and minor concerns, with a few new ones as well I did not realize earlier, but still think it is important.

      We would like to thank the reviewer for the comments.

      - We worked on the English again and tried to improve the language.

      - We avoided to use too many abbreviations in the Abstract and reduced them to a minimum.

      - We included a small paragraph about non-perforated vs. perforated active zones in both the Results and Discussion sections. However, since the majority of active zones in all cortical layers of the human TLN were of the macular type, we concluded that it is not relevant to describe their function in more detail.

      - In Figure 3 A-C we added contour lines to the boutons to make their outlines more visible.

      - We completed the data about the astrocytic coverage in the Results section (see also below).

      - Concerning the vesicle pools please see below.

      Main concerns:

      (1) Epileptic patients:

      As all patients were epileptic, it is not correct to state in the abstract that non-epileptic tissue was investigated. Even if the seizure onset zone was not in the region investigated, seizures usually invade the temporal lobe in TLE. If you can prove that no spiking activity occurred in the sample you investigated and the seizures did not invade that region, then you can write that it is presumably non-epileptic. I would suggest to write “L1 of the human temporal lobe neocortical biopsy tissue". See also Methods lines 608-612. Write only “non-epileptic" or “non-affected" if you verified it with EcoG. If this was the case, please write a few sentences about it in the Methods.

      We rephrased Material and Methods concerning this point and added that patients were monitored with EEG, MRI and multielectrode recordings. In addition, we stated that the epileptic focus was always far away from the neocortical tissue samples. Furthermore, we added a small paragraph that functional studies using the same methodology have shown that neocortical access tissue samples taken from epilepsy surgery do not differ in electrophysiological properties and synaptic physiology when compared with acute slice preparations in experimental animals and we quoted the relevant papers.

      We hope that the reviewer is now convinced that our tissue samples can be regarded as non-affected.

      (2) About the inhibitory/excitatory synapses.

      Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. Now, I do understand that only excitatory synapses were investigated. Although it was written in the title, I did not realized, since all over the manuscript the Authors were writing synapses, and were distinguishing between inhibitory and excitatory synapses in the text and showing numerous excitatory and inhibitory synapses on Figure 2 and discussing inhibitory interneurons in the Discussion as well. Maybe this was the reason why two reviewers out of the three (including myself) thought you investigated both types of synapses but did not differentiated between them. So, please, emphasize in the Abstract (line 40), Introduction (for ex. line 92-97) and the Discussion (line 369) that only excitatory synaptic boutons were investigated.

      As this paper investigated only excitatory synaptic boutons, I think it is irrelevant to write such a long section in the Discussion about inhibitory interneurons and their functions in the L1 of the human temporal lobe neocortex. Same applies to the schematic drawing of the possible wiring of L1 (Figure 7). As no inhibitory interneurons were examined, neither the connection of the different excitatory cells, only the morphology of single synaptic boutons without any reference on their origin, I think this figure does not illustrate the work done in this paper. This could be a figure of a review paper about the human L1, but is inappropriate in this study.

      We followed the reviewer’s suggestion and pointed out explicitly that we only investigated excitatory synaptic boutons. We also changed the Discussion and focused more on circuitry in L1 and the role of CR-cells.

      (3) Perforated synapses

      The findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed” I did not ask the Authors to say that perforated synapses are more efficient. However, based on the literature (for ex. Harris et al, 1992; Carlin and Siekievitz, 1982; Nieto-Sampedro et al., 1982) the presence of perforated synapses is indeed a good sign of synapse division/formation - which in turn might be coupled to synaptic plasticity (Geinisman et al, 1993), increased synaptic activity (Vrensen and Cardozo, 1981), LTP (Geinisman et al, 1991, Harris et al, 2003), pathological axonal sprouting (Frotscher et al, 2006), etc. I think it is worth mentioning this at least in the Discussion.

      We agree with the reviewer and added a small paragraph in the Results section about the two types of AZs in L1 of the human TLN. We pointed out that there are both types, macular non-perforated and perforated AZs, but the majority in all layers were of the non-perforated type. In the Discussion we added some paper pointing out the role of perforated synapses.

      (4) Question about the vesicle pools

      Results, Line 271: Still not understandable, why the RRP was defined as {less than or equal to}10 nm and {less than or equal to}20nm. Why did you use two categories? One would be sufficient (for example {less than or equal to}20nm). Or the vesicles between 10 and 20nm were considered to be part of RRP? In this case there is a typo, it should be {greater than or equal to}10 nm and {less than or equal to}20nm.

      The answer of the Authors was to my question raised: We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      This does not clarify why did you use two categories. Furthermore, I did not receive answer (such as Referee #2) for my question on how could you have 3x as many docked vesicles than vesicles {less than or equal to}10nm. The category {less than or equal to}10nm should also contain the docked vesicles. Or if this is not the case, please, clarify better what were your categories.

      We thank the reviewer for pointing out that mentioning two distance criteria (p10 and p20) to define one physiological entity (RRP) is somewhat confusing and we acknowledge that the initial response to the reviewers falls short of explaining this choice. This is indeed only understandable in the context of the original paper by Sätzler et al. 2002, where these criteria were first introduced. We therefore referenced this publication more prominently in the paragraph in question.

      So to explain this, we first would like to clarify the definition of the two RRP classification criteria used (p10 and p20), which has caused some confusion amongst the reviewers as to which vesicles where included or not:

      - p10 criterion: p£10 nm (SVs have a minimum distance less than or equal to 10 nm from the PreAZ), including ‘docked’ vesicles which have a distance of zero or less (p0)

      - p20 criterion: p£20 nm (SVs have a minimum distance less than or equal to 20 nm from the PreAZ), including vesicles of the p10 criterion.

      As mentioned, these criteria were introduced first in Sätzler et al. 2002 looking at the Calyx of Held synapse. In that paper, we tried to establish a morphological correlate to existing physiological measurements, which included the RRP. As there is no known marker that would allow to discriminate between vesicles that contribute to the RRP anatomically, we looked at existing physiological experiments such as Schneggenburger et al. 1999; Wu and Borst 1999; Sun and Wu 2001 and compared their total numbers to our measurements. As the number of docked vesicles (p0, see above) was on the lower side of these physiological estimates, we also looked at vesicles close to the AZ, which we think could be recruited within a short time (£ 10 msec). Comparing with existing literature, we found that at p20 we get pool sizes comparable to midrange estimates of reported RRP sizes. In order to account for the variability of the observed physiological pool sizes, we reported all three measurements (p0, p10, p20) not only in the original Calyx of Held, but in all subsequent studies of different CNS synapses of our group since then.

      As it remains uncertain if such correlate indeed exists, we therefore followed the suggestion to rephrase RRP and RP to putative RRP and putative RP (see also Rollenhagen et al. 2007). We thank both reviewers for pointing out this omission.

      Concerning the difference between ‘docked’ vesicles and vesicles within the p10 perimeter criterion. First of all, the reviewer is right in saying that the category p10 ({less than or equal to}10nm) should also contain the docked vesicles (see above). The fact to have 3x as many ‘docked’ vesicles in our TEM tomography than in the p10 distance analysis could be partly explained, on the one hand, by a very high variability between patients (as expressed by the high SD, table 1) and, on the other hand, by a high intraindividual synaptic bouton variability. In both sublayers, there is a huge difference in the number of vesicles within the p10 criterion of individual synaptic boutons ranging from 0 to ~40 with a mean value of ~1 to ~4 (calculated per patient), the upper level being close to the values calculated with TEM tomography for the ‘docked’ vesicles.

      (5) Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      In our previous revised version, we had included the values shown in Fig. 6 for both L1 and L4 in the Results section (L4: lines 352 – 355: ‘The findings in L1…’). However, we agree with the reviewer and have now also added the number of patients and synapses investigated (now lines 359 – 365).

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles.

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      We do not entirely agree with the reviewer on this point. As stated in the text, there are structural criteria to identify astrocytic elements (see citations quoted). These golden standard criteria are commonly used also by other well-known groups (DeFelipe and co-workers, Francisco Clasca and co-workers; Michael Frotscher the late and co-workers etc.). However, in a past paper about astrocytic coverage of synaptic complexes in L5 of the human TLN, immunohistochemistry against glutamine synthetase, a key enzyme in astrocytes, was carried out to describe the coverage. This experiment supports our findings in the other cortical layers of the human TLN. As the reviewer might know, immunohistochemistry always led to a reduction in ultrastructural preservation, so we decided not to use immunohistochemistry for the further publications of the other cortical layers. We added a short notice on this in the Material and Methods section.

      (6) Large interindividual differences in the synapse density should be discussed in the Discussion.

      As suggested by the reviewer we have included a sentence in the Discussion that interindividual differences can be either related to differences in age, gender and the use of different methodology as suggested by DeFelipe and co-workers (1999)

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The MS is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance.

      The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required, answered most of my concerns, included additional data sets, and clarified statements where needed.

      My remaining points are:

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections result in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We thank the reviewer for the helpful comments. We followed the recommendation to measure the vesicle diameter using our TEM tomography tilt series, but came to similar results concerning this synaptic parameter. As stated in our Material and Methods section, we only counted (measured) clear ring-link structures according to a paper by Abercrombie (1963). Since our results are similar for both methods, we do believe that our measurements are correct. Even random single measurements on the original 3D tilt-series yielded comparable results (Lübke and co-workers, personal observation). Furthermore, our results are within ranges, although with high variability, also described by other groups (see discussion lines 436 - 449). We therefore hope that the reviewer will now accept our measurements.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Even it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming is also a prerequisite for release.

      It would help to call these pools as "putative" correlates of the morphological categories.

      We followed the suggestion by the reviewer and renamed our vesicle pools as putative RRP, putative RP and putative resting pools.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen at al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      Weaknesses:

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles is also more complex than it is suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      See above.

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles. Please, see the photos below, out of the 16 circled profiles (2nd picture, very similar to each other) only 3 belong to an astroglial cell (last picture, purple profiles-purple cell), 10 are spines/spine necks/small caliber dendrites of pyramidal cells, 3 are axonal profiles (last but one picture, blue profiles, marked with arrows on the right side). If you follow in your serial sections those elements which you think are glial processes and indeed they are attached to a confidently identifiable glial cell, I agree, it is a glial process. But identifying small, almost empty profiles without any specific staining, from one single EM section, as glial process is very uncertain. Please, check the database of the Allen Institute made from the V1 visual cortex of a mouse. It is a large series of EM sections where they reconstructed thousands of neurons, astroglial and microglial cells. It is possible to double click on the EM picture on a profile and it will show the cell to which that profile belongs. https://portal.brain-map.org/connectivity/ultrastructural-connectomics Pictures included here: https://elife-rp.msubmit.net/eliferp_files/2024/11/25/00132644/02/132644_2_attach_21_29456_convrt.pdf

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      As stated above, we carried out glutamine synthetase immunohistochemistry in L5 of the human TLN and came to the same results. However, we added a sentence on this in the chapter on astrocytic coverage in the Material and Methods section. Additionally, we modified this chapter according to the reviewer’s suggestion.

      Minor comments

      Introduction: Last sentence is not understandable (lines 101-103), please rephrase. (contribute to understand or contribute in understanding or contribute to the understanding of..., but definitely not contribute to understanding). The authors should check and review extensively for improvements to the use of English, or use a program such as Grammarly.

      Results: Grammar (line 107): L1 in the adult mammalian neocortex represents a relatively...

      Line 173: “Some SBs in both sublaminae were seen to establish either two or three SBs on the same spine, spines 173 of other origin or dendritic shafts." - Some SBs established two or three SBs? I would write Some SBs established two or three synapses on...

      Line 243: “The synaptic cleft size were slightly, but non-significantly different"

      Line 260: “DCVs play an important role in endo- and exocytosis, the build-up of PreAZs by releasing Piccolo and Bassoon (Schoch and Gundelfinger 2006; Murkherjee et al. 2010)," - please, correct this.

      We have done corrections as suggested by the reviewer.

      Line 374: No point at the end of the last phrase.

      Discussion:

      Lines 400-404: “The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." - What is comparable with the other layers, but different from animals? Please rephrase this sentence, it is not understandable. I already mentioned this sentence in my previous review, but nothing happened.

      Lines 435-437: “Remarkably, the total pool sizes in the human TLN were significantly larger by more than 6-fold (~550 SVs/AZ), and ~4.7-fold (~750 SVs/AZ;) than those in L4 and L5 (Yakoubi et al. 2019a, b; see also Rollenhagen et al. 2018) in rats." Please rethink what you wished to say and compare to the sentence meaning. I think you wanted to compare human TLN L1 pool size to L4 and L5 in the human TLN (Yakoubi 2019a and b) and to rat (Rollenhagen 2018). Instead, you compared all layers of the human TLN to L4 and L5 in rats (with partly wrong references). Please rephrase this. Lines 483-484: “Astrocytes serve as both a physical barrier to glutamate diffusion and as mediate neurotransmitter uptake via transporters".

      This sentence is grammatically incorrect, please rephrase.

      We corrected the sentences as suggested by the reviewer.

      Methods:

      In the text, there are only 4 patients (lines 603-604), but in the supplementary table there are 9 patients (5 new included for L4 astrocytic coverage). Please, correct it in the text.

      Lines 608-609: “neocortical access tissue samples were resected to control the seizures for histological inspection by neuropathologists." - What is the meaning of this? Please, rephrase.

      We thank the reviewer for the comment and included the 5 patients used for L4 to the Material and Methods section, as well as in the Results section.

      The reviewer is right, and we rephrased and corrected the sentence concerning the inspection by neuropathologists.

      Figures

      Figures 5B: The legend says “SB (sb) synapsing on a stubby spine (sp) with a prominent spine apparatus (framed area) and a thick dendritic segment (de) in L1b" - In my opinion this is not one synaptic bouton, but two. Clearly visible membranes separate them, close to the spine.

      Supplemental Table 2 (patient table). If there is no information about Hu_04 patient's epilepsy, please write N/A (=non available) instead of - (which means it does not exist).

      The reviewer is right, and we corrected the figure and the legend, as well as the table accordingly.

      Reviewer #2 (Recommendations for the authors):

      The authors addressed almost all of my concern, only this one remained:

      If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      There is a very detailed new study on calculating correction for TEM 2D 3D, Rothman et al 2023 PLOS One. That addresses most of these issues.

      We thank the reviewer for drawing our attention to the publication by Rothman et al. 2023, which is a very detailed and comprehensive study looking at accurately estimating distributions of 3D size and densities of particles from 2D measurements using – amongst others – ET and TEM images as well as synaptic vesicles for validating their method. However, we do not see how this would be relevant to the reported mean diameters and their corresponding variances. And even if we would have reported on vesicle size/diameter distributions (referred to as G(d) in Rothmann et al. 2023), the authors themselves state that “… the results from our ET and TEM image analysis highlight the difficulty in computing a complete G(d) of MFT vesicles due to their small size…

    1. In addition, homophobia has diverse roots, so being more aware of thedifferent biases and anxieties behind its expressions can be key to challeng-ing it and to challenging transphobia and other forms of exclusion as well.Even in the midst of thinking about bias and ensuring a fully educationalresponse, there is a danger in letting homophobia define how and why les-sons on sexual minorities are included in school. Institutional and legal re-strictions have shaped the lives of sexual minority people, yet it would be avast oversimplification to say that is the only reality of their lives. Sexuality,as discussed in Chapter 1, has a long and varied history-indeed historiesof identities and subjectivities may bear little resemblance to the categoriesby which we currently define sexual identity. As much as those communitiesand identity formations were related to restrictions on individuals' ability tolive, they nonetheless formed cultures and associations, and-like other mi-norities living in a cultural context shaped by bias-reshaped their worlds.Tactically, it may be possible to convince people who initially do not wantto include sexual minority issues in schooling that to do so would helpaddress the risks that LGBTQ students face. However, we also need to becareful not to frame LGBTQ issues as only risk or deficit ones. We need toprovide the opportunity to examine the positive aspects of LGBTQ commu-nities and cultures and the abilities of sexuality and gender diverse people tolive lives beyond institutional constraints.

      This section really made me think about how LGBTQ topics are often framed around danger, risk, or trauma. While those realities are important, it's limiting if that’s all we focus on. I like how the reading reminds us that LGBTQ communities also have resilience, joy, and rich cultural histories. Including those aspects in education helps move the conversation from tolerance to genuine respect.

    2. particular relationship to one another? How are sexual identities also de-fined by intense relationships, desires that may not be acted upon? Howare attractions defined through ideas about gender, race, and class? In otherwords, as we think about making schools safer for sexual minorities, howdo we even begin to address important issues, for instance, whether racialharassment is part of homophobia?

      This reaffirms that sexuality and gender are far more slippery and complex than categories can imply. It reinforces that even when schools try to place "normal" expectations upon them, people's experiences of identity cannot be constrained within firm boxes. By inquiring how sexuality intersects with race, class, and gender, the book highlights that safe schools for LGBTQ students require responding to broader systems of oppression rather than discrete cases of bullying and discrimination. It challenges us to examine more thoroughly how all students, regardless of identity, do well when schools push back on narrow definitions of what is "normal."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines a range of advanced ultrastructural imaging approaches to define the unusual endosomal system of African trypanosomes. Compelling images show that instead of a distinct set of compartments, the endosome of these protists comprises a continuous system of membranes with functionally distinct subdomains as defined by canonical markers of early, late and recycling endosomes. The findings suggest that the endocytic system of bloodstream stages has evolved to facilitate the extraordinarily high rates of membrane turnover needed to remove immune complexes and survive in the blood, which is of interest to anyone studying infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Bloodstream stages of the parasitic protist, Trypanosoma brucei, exhibit very high rates of constitutive endocytosis, which is needed to recycle the surface coat of Variant Surface Glycoproteins (VSGs) and remove surface immune complexes. While many studies have shown that the endo-lysosomal systems of T. brucei BF stages contain canonical domains, as defined by classical Rab markers, it has remained unclear whether these protists have evolved additional adaptations/mechanisms for sustaining these very high rates of membrane transport and protein sorting. The authors have addressed this question by reconstructing the 3D ultrastructure and functional domains of the T. brucei BF endosome membrane system using advanced electron tomography and super-resolution microscopy approaches. Their studies reveal that, unusually, the BF endosome network comprises a continuous system of cisternae and tubules that contain overlapping functional subdomains. It is proposed that a continuous membrane system allows higher rates of protein cargo segregation, sorting and recycling than can otherwise occur when transport between compartments is mediated by membrane vesicles or other fusion events.

      Strengths:

      The study is a technical tour-de-force using a combination of electron tomography, super-resolution/expansion microscopy, immune-EM of cryo-sections to define the 3D structures and connectivity of different endocytic compartments. The images are very clear and generally support the central conclusion that functionally distinct endocytic domains occur within a dynamic and continuous endosome network in BF stages.

      Weaknesses:

      The authors suggest that this dynamic endocytic network may also fulfil many of the functions of the Golgi TGN and that the latter may be absent in these stages. Although plausible, this comment needs further experimental support. For example, have the authors attempted to localize canonical makers of the TGN (e.g. GRIP proteins) in T. brucei BF and/or shown that exocytic carriers bud directly from the endosomes?

      We agree with the criticism and have shortened the discussion accordingly and clearly marked it as speculation. However, we do not want to completely abandon our hypothesis.

      The paragraph now reads:

      Lines 740 – 751:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions has been described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      Furthermore, we removed the lines 51 - 52, which included the suggestion of the TGN as a master regulator, from the abstract.

      Reviewer #2 (Public Review):

      The authors suggest that the African trypanosome endomembrane system has unusual organisation, in that the entire system is a single reticulated structure. It is not clear if this is thought to extend to the lysosome or MVB. There is also a suggestion that this unusual morphology serves as a trans-(post)Golgi network rather than the more canonical arrangement.

      The work is based around very high-quality light and electron microscopy, as well as utilising several marker proteins, Rab5A, 11 and 7. These are deemed as markers for early endosomes, recycling endosomes and late or pre-lysosomes. The images are mostly of high quality but some inconsistencies in the interpretation, appearance of structures and some rather sweeping assumptions make this less easy to accept. Two perhaps major issues are claims to label the entire endosomal apparatus with a single marker protein, which is hard to accept as certainly this reviewer does not really even know where the limits to the endosomal network reside and where these interface with other structures. There are several additional compartments that have been defined by Rob proteins as well, and which are not even mentioned. Overall I am unconvinced that the authors have demonstrated the main things they claim.<br /> The endomembrane system in bloodstream form T. brucei is clearly delimited. Compared to mammalian cells it is tidy and confined to the posterior part of the spindleshaped cell. The endoplasmic reticulum is linked to one side of the longitudinal cell axis, marked by the attached flagellum, while the mitochondrion locates to the opposite side. Glycosomes are easily identifiable as spheres, as are acidocalcisomes, which are smaller than glycosomes and – in electron micrographs – are characterized by high electron density. All these organelles extend beyond the nucleus, which is not the case for the endosomal compartment, the lysosome and the Golgi. The vesicles found in the posterior half of the trypanosome cell are quantitatively identifiable as COP1, CCVI or CCVII vesicles, or exocytic carriers. The lysosome has a higher degree of morphological plasticity, but this is not topic of the present work. Thus, the endomembrane system in T. brucei is comparatively well structured and delimited, which is why we have chosen trypanosomes as cell biological model.

      We have published EP1::GFP as marker for the endosome system and flagellar pocket back in 2004. We have defined the fluid phase volume of the trypanosome endosome in papers published between 2002 and 2007. This work was not intended to represent the entirety of RAB proteins. We were only interested in 3 canonical markers for endosome subtypes. We do not claim anything that is not experimentally tested, we have clearly labelled our hypotheses as such, and we do not make sweeping assumptions.

      The approaches taken are state-of-the-art but not novel, and because of the difficulty in fully addressing the central tenet, I am not sure how much of an impact this will have beyond the trypanosome field. For certain this is limited to workers in the direct area and is not a generalisable finding.

      To the best of our knowledge, there is no published research that has employed 3D Tokuyasu or expansion microscopy (ExM) to label endosomes. The key takeaway from our study, which is the concept that "endosomes are continuous in trypanosomes" certainly is novel. We are not aware of any other report that has demonstrated this aspect.

      The doubts formulated by the reviewer regarding the impact of our work beyond the field of trypanosomes are not timely. Indeed, our results, and those of others, show that the conclusions drawn from work with just a few model organisms is not generalisable. We are finally on the verge of a new cell biology that considers the plethora of evolutionary solutions beyond ophistokonts. We believe that this message should be widely acknowledged and considered. And we are certainly not the only ones who are convinced that the term "general relevance" is unscientific and should no longer be used in biology.

      Reviewer #3 (Public Review):

      Summary:

      As clearly highlighted by the authors, a key plank in the ability of trypanosomes to evade the mammalian host’s immune system is its high rate of endocytosis. This rapid turnover of its surface enables the trypanosome to ‘clean’ its surface removing antibodies and other immune effectors that are subsequently degraded. The high rate of endocytosis is likely reflected in the organisati’n and layout of the endosomal system in these parasites. Here, Link et al., sought to address this question using a range of light and three-dimensional electron microscopy approaches to define the endosomal organisation in this parasite.

      Before this study, the vast majority of our information about the make-up of the trypanosome endosomal system was from thin-section electron microscopy and immunofluorescence studies, which did not provide the necessary resolution and 3D information to address this issue. Therefore, it was not known how the different structures observed by EM were related. Link et al., have taken advantage of the advances in technology and used an impressive combination of approaches at the LM and EM level to study the endosomal system in these parasites. This innovative combination has now shown the interconnected-ness of this network and demonstrated that there are no ‘classical’ compartments within the endosomal system, with instead different regions of the network enriched in different protein markers (Rab5a, Rab7, Rab11).

      Strengths:

      This is a generally well-written and clear manuscript, with the data well-presented supporting the majority of the conclusions of the authors. The authors use an impressive range of approaches to address the organisation of the endosomal system and the development of these methods for use in trypanosomes will be of use to the wider parasitology community.

      I appreciate their inclusion of how they used a range of different light microscopy approaches even though for instance the dSTORM approach did not turn out to be as effective as hoped. The authors have clearly demonstrated that trypanosomes have a large interconnected endosomal network, without defined compartments and instead show enrichment for specific Rabs within this network.

      Weaknesses:

      My concerns are:

      i) There is no evidence for functional compartmentalisation. The classical markers of different endosomal compartments do not fully overlap but there is no evidence to show a region enriched in one or other of these proteins has that specific function. The authors should temper their conclusions about this point.

      The reviewer is right in stating that Rab-presence does not necessarily mean Rabfunction. However, this assumption is as old as the Rab literature. That is why we have focused on the 3 most prominent endosomal marker proteins. We report that for endosome function you do not necessarily need separate membrane compartments. This is backed by our experiments.

      ii) The quality of the electron microscopy work is very high but there is a general lack of numbers. For example, how many tomograms were examined? How often were fenestrated sheets seen? Can the authors provide more information about how frequent these observations were?

      The fenestrated sheets can be seen in the majority of the 37 tomograms recorded of the posterior volume of the parasites. Furthermore, we have randomly generated several hundred tiled (= very large) electron micrographs of bloodstream form trypanosomes for unbiased analyses of endomembranes. In these 2D-datasets the “footprint” of the fenestrated flat and circular cisternae is frequently detectable in the posterior cell area.

      We now have included the corresponding numbers in all EM figure legends.

      iii) The EM work always focussed on cells which had been processed before fixing. Now, I understand this was important to enable tracers to be used. However, given the dynamic nature of the system these processing steps and feeding experiments may have affected the endosomal organisation. Given their knowledge of the system now, the authors should fix some cells directly in culture to observe whether the organisation of the endosome aligns with their conclusions here.

      This is a valid criticism; however, it is the cell culture that provides an artificial environment. As for a possible effect of cell harvesting by centrifugation on the integrity and functionality of the endosome system, we consider this very unlikely for one simple reason. The mechanical forces acting in and on the parasites as they circulate in the extremely crowded and confined environment of the mammalian bloodstream are obviously much higher than the centrifugal forces involved in cell preparation. This becomes particularly clear when one considers that the mass of the particle to be centrifuged determines the actual force exerted by the g-forces. Nevertheless, the proposed experiment is a good control, although much more complex than proposed, since tomography is a challenging technique. We have performed the suggested experiment and acquired tomograms of unprocessed cells. The corresponding data is now included as supplementary movie 2, 3 and 4. We refer to it in lines 202 – 206: To investigate potential impacts of processing steps (cargo uptake, centrifugation, washing) on endosomal organization, we directly fixed cells in the cell culture flask, embedded them in Epon, and conducted tomography. The resulting tomograms revealed endosomal organization consistent with that observed in cells fixed after processing (see Supplementary movie 2, 3, and 4).

      We furthermore thank the reviewer for the experiment suggestion in the acknowledgments.

      iv) The discussion needs to be revamped. At the moment it is just another run through of the results and does not take an overview of the results presenting an integrated view. Moreover, it contains reference to data that was not presented in the results.

      We have improved the discussion accordingly.

      Recommendations for the authors:

      The reviewers concurred about the high calibre of the work and the importance of the findings.

      They raised some issues and made some suggestions to improve the paper without additional experiments - key issues include

      (1) Better referencing of the trypanosome endocytosis/ lysosomal trafficking literature.

      The literature, especially the experimental and quantitative work, is very limited. We now provide a more complete set of references. However, we would like to mention that we had cited a recent review that critically references the trypanosome literature with emphasis on the extensive work done with mammalian cells and yeast.

      (2) Moving the dSTORM data that detracts from otherwise strong data in a supplementary figure.

      We have done this.

      (3) Removal of the conclusion that the continuous endosome fulfils the functions of TGN, without further evidence.

      As stated above, this was not a conclusion in our paper, but rather a speculation, which we have now more clearly marked as such. Lines 740 to 751 now read:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      (4) Broader discussion linking their findings to other examples of organelle maturation in eukaryotes (e.g cisternal maturation of the Golgi)

      We have improved the discussion accordingly.

      Reviewer #1 (Recommendations For The Authors):

      What are the multi-vesicular vesicles that surround the marked endosomal compartments in Fig 1. Do they become labelled with fluid phase markers with longer incubations (e.g late endosome/ lysosomal)?

      The function of MVBs in trypanosomes is still far from being clear. They are filled with fluid phase cargo, especially ferritin, but are devoid of VSG. Hence it is likely that MVBs are part of the lysosomal compartment. In fact, this part of the endomembrane system is highly dynamic. MVBs can be physically connected to the lysosome or can form elongated structures. The surprising dynamics of the trypanosome lysosome will be published elsewhere.

      Figure 2. The compartments labelled with EP1::Halo are very poorly defined due to the low levels of expression of the reporter protein and/or sensitivity of detection of the Halo tag. Based on these images, it would be hard to conclude whether the endosome network is continuous or not. In this respect, it is unclear why the authors didn't use EP1-GFP for these analyses? Given the other data that provides more compelling evidence for a single continuous compartment, I would suggest removing Fig 2A.

      We have used EP1::GFP to label the entire endosome system (Engstler and Boshart, 2004). Unfortunately, GFP is not suited for dSTORM imaging. By creating the EP1::Halo cell line, we were able to utilize the most prominent dSTORM fluorescent dye, Alexa 647. This was not primarily done to generate super resolution images, but rather to measure the dynamics of the GPI-anchored, luminal protein EP with single molecule precision. The results from this study will be published separately. But we agree with the reviewer and have relocated the dSTORM data to the supplementary material.

      The observation that Rab5a/7 can be detected in the lumen of lysosome is interesting. Mechanistically, this presumably occurs by invagination of the limiting membrane of the lysosome. Is there any evidence that similar invagination of cytoplasmic markers occurs throughout or in subdomains of the endocytic network (possibly indicative of a 'late endosome' domain)?

      So far, we have not observed this. The structure of the lysosome and the membrane influx from the endosome are currently being investigated.

      The authors note that continuity of functionally distinct membrane compartments in the secretory/endocytic pathways has been reported in other protists (e.g T. cruzi). A particular example that could be noted is the endo-lysosomal system of Dictyostelium discoideum which mediates the continuous degradation and eventual expulsion of undigested material.

      We tried to include this in the discussion but ultimately decided against it because the Dictyostelium system cannot be easily compared to the trypanosome endosome.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Not sure that 'common' is the correct term here. Frequent, near-universal..... it would be true that endocytosis is common across most eukaryotes.

      We have changed the sentence to “common process observed in most eukaryotes” (line 33).

      Immune evasion - the parasite does not escape the immune system, but does successfully avoid its impact, at least at the population level.

      We have replaced the word “escape” with “evasion” (line 35).

      The third sentence needs to follow on correctly from the second. Also, more than Igs are internalised and potentially part of immune evasion, such as C3, Factor H, ApoL1 etcetera.

      We believe that there may be a misunderstanding here. The process of endocytic uptake and lysosomal degradation has so far only been demonstrated in the context of VSGbound antibodies, which is why we only refer to this. Of course, the immune system comprises a wide range of proteins and effector molecules, all of which could be involved in immune evasion.

      I do not follow the logic that the high flux through the endocytic system in trypanosomes precludes distinct compartmentalisation - one could imagine a system where a lot of steps become optimised for example. This idea needs expanding on if it is correct.

      Membrane transport by vesicle transfer between several separate membrane compartments would be slower than the measured rate of membrane flux.

      Again I am not sure 'efficient' on line 40. It is fast, but how do you measure efficiency? Speed and efficiency are not the same thing.

      We have replaced the word “efficient” with “fast” (line 42).

      The basis for suggesting endosomes as a TGN is unclear. Given that there are AP complexes, retromer, exocyst and other factors that are part of the TGN or at least post-G differentiation of pathways in canonical systems, this seems a step too far. There really is no evidence in the rest of the MS that seems to support this.

      Yes, we agree and have clarified the discussion accordingly. We have not completely removed the discussion on the TGN but have labelled it more clearly as speculation.

      I am aware I am being pedantic here, but overall the abstract seems to provide an impression of greater novelty than may be the case and makes several very bold claims that I cannot see as fully valid.

      We are not aware of any claim in the summary that we have not substantiated with experiments, or any hypothesis that we have not explained.

      Moreover, the concept of fused or multifunctional endosomes (or even other endomembrane compartments) is old, and has been demonstrated in metazoan cells and yeast. The concept of rigid (in terms of composition) compartments really has been rejected by most folks with maturation, recycling and domain structures already well-established models and concepts.

      We agree that the (transient) presence of multiple Rab proteins decorating endosomes has been demonstrated in various cell types. This finding formed the basis for the endosomal maturation model in mammals and yeast, which has replaced the previous rigid compartment model.

      However, we do not appreciate attempts to question the originality of our study by claiming that similar observations have been made in metazoans or yeast. This is simply wrong. There are no reports of a functionally structured, continuous, single and large endosome in any other system. The only membrane system that might be similar was described in the American parasite Trypanosoma cruzi, however, without the use of endosome markers or any functional analysis. We refer to this study in the discussion.

      In summary, the maturation model falls short in explaining the intricacies of the membrane system we have uncovered in trypanosomes. Therefore, one plausible interpretation of our data is that the overall architecture of the trypanosome endosomes represents an adaptation that enables the remarkable speed of plasma membrane recycling observed in these parasites. In our view, both our findings and their interpretation are novel and worth reporting. Again, modern cell biology should recognize that evolution has developed many solutions for similar processes in cells, about whose diversity we have learned almost nothing because of our reductionist view. A remarkable example of this are the Picozoa, tiny bipartite eukaryotes that pack the entire nutritional apparatus into one pouch and the main organelles with the locomotor system into the other. Another one is the “extreme” cell biology of many protozoan parasites such as Giardia, Toxpoplasma or Trypanosoma.

      Higher plants have been well characterised, especially at the level of Rab/Arf proteins and adaptins.

      We now mention plant endosomes in our brief discussion of the trypanosome TGN. Lines 744 – 747:

      “A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019).”

      The level of self-citing in the introduction is irritating and unscholarly. I have no qualms with crediting the authors with their own excellent contributions, but work from Dacks, Bangs, Field and others seems to be selectively ignored, with an awkward use of the authors' own publications. Diversity between organisms for example has been a mainstay of the Dacks lab output, Rab proteins and others from Field and work on exocytosis and late endosomal systems from Bangs. These efforts and contributions surely deserve some recognition?

      This is an original article and not a review. For a comprehensive overview the reviewer might read our recent overview article on exo- and endocytic pathways in trypanosomes, in which we have extensively cited the work of Mark Field, Jay Bangs and Joel Dacks. In the present manuscript, we have cited all papers that touch on our results or are otherwise important for a thorough understanding of our hypotheses. We do not believe that this approach is unscientific, but rather improves the readability of the manuscript. Nevertheless, we have now cited additional work.

      For the uninitiated, the posterior/anterior axis of the trypanosome cell as well as any other specific features should be defined.

      In lines 102 - 110 we wrote:

      “This process of antibody clearance is driven by hydrodynamic drag forces resulting from the continuous directional movement of trypanosomes (Engstler et al., 2007). The VSG-antibody complexes on the cell surface are dragged against the swimming direction of the parasite and accumulate at the posterior pole of the cell. This region harbours an invagination in the plasma membrane known as the flagellar pocket (FP) (Gull, 2003; Overath et al., 1997). The FP, which marks the origin of the single attached flagellum, is the exclusive site for endo- and exocytosis in trypanosomes (Gull, 2003; Overath et al., 1997). Consequently, the accumulation of VSG-antibody complexes occurs precisely in the area of bulk membrane uptake.”

      We think this sufficiently introduces the cell body axes.

      I don't understand the comment concerning microtubule association. In mammalian cells, such association is well established, but compartments still do not display precise positioning. This likely then has nothing to do with the microtubule association differences.

      We have clarified this in the text (lines 192 – 199). There is no report of cytoplasmic microtubules in trypanosomes. All microtubules appear to be either subpellicular or within the flagellum. To maintain the structure and position of the endosomal apparatus, they should be associated either with subpellicular microtubules, as is the case with the endoplasmic reticulum, or with the more enigmatic actomyosin system of the parasites. We have been working on the latter possibility and intend to publish a follow-up paper to the present manuscript.

      The inability to move past the nucleus is a poor explanation. These compartments are dynamic. Even the nucleus does interesting things in trypanosomes and squeezes past structures during development in the tsetse fly.

      The distance between the nucleus and the microtubule cytoskeleton remains relatively constant even in parasites that squeeze through microfluidic channels. This is not unexpected as the nucleus can be highly deformed. A structure the size of the endosome will not be able to physically pass behind the nucleus without losing its integrity. In fact, the recycling apparatus is never found in the anterior part of the trypanosome, most probably because the flagellar pocket is located at the posterior cell pole.

      L253 What is the evidence that EP1 labels the entire FP and endosomes? This may be extensive, but this claim requires rather more evidence. This is again suggested at l263. Again, please forgive me for being pedantic, but this is an overstatement unless supported by evidence that would be incredibly difficult to obtain. This is even sort of acknowledged on l271 in the context of non-uniform labelling. This comes again in l336.

      The evidence that EP1 labels the entire FP and endosomes is presented here: Engstler and Boshart, 2004; 10.1101/gad.323404).

      Perhaps I should refrain from comments on the dangers of expansion microscopy, or asking what has actually been gained here. Oddly, the conclusion on l290 is a fair statement that I am happy with.

      An in-depth discussion regarding the advantages and disadvantages of expansion microscopy is beyond the manuscript's intended scope. Our approach involved utilizing various imaging techniques to confirm the validity of our findings. We appreciate that our concluding sentence is pleasing.

      F2 - The data in panel A seem quite poor to me. I also do not really understand why the DAPI stain in the first and second columns fails to coincide or why the kinetoplast is so diffuse in the second row. The labelling for EP1 presents as very small puncta, and hence is not evidence for a continuum. What is the arrow in A IV top? The data in panel B are certainly more in line with prior art, albeit that there is considerable heterogeneity in the labelling and of the FP for example. Again, I cannot really see this as evidence for continuity. There are gaps.... Albeit I accept that labelling of such structures is unlikely to ever be homogenous.

      We agree that the dSTORM data represents the least robust aspect of the findings we have presented, and we concur with relocating it to the supplementary material.

      F3 - Rather apparent, and specifically for Rab7, that there is differential representation - for example, Cell 4 presents a single Rab7 structure while the remaining examples demonstrate more extensive labelling. Again, I am content that these are highly dynamic strictures but this needs to be addressed at some level and commented upon. If the claim is for continuity, the dynamics observed here suggest the usual; some level of obvious overlap of organellar markers, but the representation in F3 is clever but not sure what I am looking at. Moreover, the title of the figure is nothing new. What is also a bit odd is that the extent of the Rab7 signal, and to some extent the other two Rabs used, is rather variable, which makes this unclear to me as to what is being detected. Given that the Rab proteins may be defining microdomains or regions, I would also expect a region of unique straining as well as the common areas. This needs to at least be discussed.

      The differences in the representation result from the dynamics of the labelled structures. Therefore, we have selected different cells to provide examples of what the labelling can look like. We now mention this in the results section.

      The overlap of the different Rab signals was perhaps to be expected, but we now have demonstrated it experimentally. Importantly, we performed a rigorous quantification by calculating the volume overlaps and the Pearson correlation coefficients.

      In previous studies the data were presented as maximal intensity projections, which inherently lack the complete 3D information.

      We found that Rab proteins define microdomains and that there are regions of unique staining as well as common areas, as shown in Figure 3. The volumes do not completely overlap. This is now more clearly stated in lines 315 – 319:

      “These objects showed areas of unique staining as well as partially overlapping regions. The pairwise colocalization of different endosomal markers is shown in Figure 3 A, XI - XIII and 3 B. The different cells in Figure 3 B were selected to represent the dynamic nature of the labelled structures. Consequently, the selected cells provide a variety of examples of how the labelling can appear.”

      This had already been stated in lines 331 – 336:

      “In summary, the quantitative colocalization analyses revealed that on the one hand, the endosomal system features a high degree of connectivity, with considerable overlap of endosomal marker regions, and on the other hand, TbRab5A, TbRab7, and TbRab11 also demarcate separated regions in that system. These results can be interpreted as evidence of a continuous endosomal membrane system harbouring functional subdomains, with a limited amount of potentially separated early, late or recycling endosomes.”

      F4-6 - Fabulous images. But a couple of issues here; first, as the authors point out, there is distance between the gold and the antigen. So, this of course also works in the z-plane as well as the x/y-planes and some of the gold may well be associated with membraneous figures that are out of the plane, which would indicate an absence of colinearity on one specific membrane. Secondly, in several instances, we have Rab7 essentially mixed with Rab11 or Rab5 positive membrane. While data are data and should be accepted, this is difficult to reconcile when, at least to some level, Rab7 is a marker for a late-endosomal structure and where the presence of degradative activity could reside. As division of function is, I assume, the major reason for intracellular compartmentalisation, such a level of admixture is hard to rationalise. A continuum is one thing but the data here seem to be suggesting something else, i.e. almost complete admixture.

      We are grateful for the positive feedback regarding the image quality. It is true that the "linkage error," representing the distance between the gold and the antigen, also functions to some extent in the z-axis. However, it's important to note that the zdimension of the section in these Figures is 55 nm. Nevertheless, it's interesting to observe that membranes, which may not be visible within the section itself but likely the corresponding Rab antigen, is discernible in Figure 4C (indicated by arrows).

      We have clarified this in lines 397 – 400:

      “Consequently, gold particles located further away may represent cytoplasmic TbRab proteins or, as the “linkage error” can also occur in the z-plane, correspond to membranes that are not visible within the 55 nm thickness of the cryosection (Figure 4, panel C, arrows). “

      The coexistence of different Rabs is most likely concentrated in regions where transitions between different functions are likely. Our focus was primarily on imaging membranes labelled with two markers. We wanted to show that the prevailing model of separate compartments in the trypanosome literature is not correct.

      F7 - Not sure what this adds beyond what was published by Grunfelder.

      First, this figure is an important control that links our results to published work (Grünfelder et al. (2003)). Second, we include double staining of cargo with Rab5, Rab7, and Rab11, whereas Grünfelder focused only on Rab11. Therefore, our data is original and of such high quality that it warrants a main figure.

      F8 - and l583. This is odd as the claim is 'proof' which in science is a hard thing to claim (and this is definitely not at a six sigma level of certainty, as used by the physics community). However, I am seeing structures in the tomograms which are not contiguous - there are gaps here between the individual features (Green in the figure).

      We have replaced the term "proof". It is important to note that the structures in individual tomograms cannot all be completely continuous because the sections are limited to a thickness of 250 nm. Therefore, it is likely that they have more connectivity above and below the imaged section. Nevertheless, we believe that the quality of the tomograms is satisfactory, considering that 3D Tokuyasu is a very demanding technique and the production of serial Tokuyasu tomograms is not feasible in practice.

      Discussion - Too long and the self-citing of four papers from the corresponding author to the exclusion of much prior work is again noted, with concerns about this as described above. Moreover, at least four additional Rab proteins are known associated with the trypanosome endosomal system, 4, 5B, 21 and 28. These have been completely ignored.

      We have outlined our position on referencing in original articles above. We also explained why we focused on the key marker proteins associated with early (Rab5), late (Rab7) and recycling endosomes (Rab11). We did not ignore the other Rabs, we just did not include them in the present study.

      Overall this is disappointing. I had expected a more robust analysis, with a clearer discussion and placement in context. I am not fully convinced that what we have here is as extreme as claimed, or that we have a substantial advance. There is nothing here that is mechanistic or the identification of a new set of gene products, process or function.

      We do not think that this is constructive feedback.

      This MS suggests that the endosomal system of African trypanosomes is a continuum of membrane structures rather than representing a set of distinct compartments. A combination of light and electron microscopy methods are used in support. The basic contention is very challenging to prove, and I'm not convinced that this has been. Furthermore, I am also unclear as to the significance of such an organisation; this seems not really addressed.

      We acknowledge and respect varying viewpoints, but we hold a differing perspective in this matter. We are convinced that the data decisively supports our interpretation. May future work support or refute our hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      Line 81 - delete 's

      Done.

      Generally, the introduction was very well written and clearly summarised our current understanding but the paragraph beginning line 134 felt out of place and repeated some of the work mentioned earlier.

      We have removed this paragraph.

      For the EM analysis throughout quantification would be useful as highlighted in the public review. How many tomograms were examined, and how often were types of structures seen? I understand the sample size is often small but this would help the reader appreciate the diversity of structures seen.

      We have included the numbers.

      Following on from this how were the cells chosen for tomogram analysis? For example, the dividing cell in 1D has palisades associating with the new pocket - is this commonly seen? Does this reflect something happening in dividing cells. This point about endosomal division was picked up in the discussion but there was little about in the main results.

      This issue is undoubtedly inherent to the method itself, and we have made efforts to mitigate it by generating a series of tomograms recorded randomly. We have refrained from delving deeper into the intricacies of the cell cycle in this manuscript, as we believe that it warrants a separate paper.

      As the authors prosecute, the co-localisation analysis highlights the variable nature of the endosome and the overlap of different markers. When looking at the LM analysis, I was struck by the variability in the size and number of labelled structures in the different cells. For example, in 3A Rab7 is 2 blobs but in 3B Cell 1 it is 4/5 blobs. Is this just a reflection of the increase in the endosome during the cell cycle?

      The variability in representation is a direct consequence of the dynamic nature of the labelled structures. For this reason, we deliberately selected different cells to represent examples of how the labelling can look like. We have decided not to mention the dynamics of the endosome during the cell cycle. This will be the subject of a further report.

      Moreover, Rab 11 looks to be the marker covering the greatest volume of the endosomal system - is this true? I think there's more analysis of this data that could be done to try and get more information about the relative volumes etc of the different markers that haven't been drawn out. The focus here is on the co-localisation.

      Precisely because we recognize the importance of this point, we intend to turn our attention to the cell cycle in a separate publication.

      I appreciate that it is an awful lot of work to perform the immuno-EM and the data is of good quality but in the text, there could be a greater effort to tie this to the LM data. For example, from the Rab11 staining in LM you would expect this marker to be the most extensive across the networks - is this reflected in the EM?

      For the immuno-EM there were no numbers, the authors had measured the position of the gold but what was the proportion of gold that was in/near membranes for each marker? This would help the reader understand both the number of particles seen and the enrichment of the different regions.

      Our original intent was to perform a thorough quantification (using stereology) of the immuno-EM data. However, we later realized that the necessary random imaging approach is not suitable for Tokuyasu sections of trypanosomes. In short, the cells are too far apart, and the cell sections are only occasionally cut so that the endosomal membranes are sufficiently visible. Nevertheless, we continue to strive to generate more quantitative data using conventional immuno-EM.

      The innovative combination of Tokuyasu tomograms with immuno-EM was great. I noted though that there was a lack of fenestration in these models. Does this reflect the angle of the model or the processing of these samples?

      We are grateful to the referee, as we have asked ourselves the same question. However, we do not attribute the apparent lack of fenestration to the viewing angle, since we did not find fenestration in any of the Tokuyasu tomograms. Our suspicion is more directed towards a methodological problem. In the Tokuyasu workflow, all structures are mainly fixed with aldehydes. As a result, lipids are only effectively fixed through their association with membrane proteins. We suggest that the fenestration may not be visible because the corresponding lipids may have been lost due to incomplete fixation.

      We now clearly state this in the lines 563 – 568.

      “Interestingly, these tomograms did not exhibit the fenestration pattern identified in conventional electron tomography. We suspect that this is due to methodological reasons. The Tokuyasu procedure uses only aldehydes to fix all structures. Consequently, effective fixation of lipids occurs only through their association with membrane proteins. Thus, the lack of visible fenestration is likely due to possible loss of lipids during incomplete fixation.”

      The discussion needs to be reworked. Throughout it contains references to results not in the main results section such as supplementary movie 2 (line 735). The explicit references to the data and figures felt odd and more suited to the results rather than the discussion. Currently, each result is discussed individually in turn and more effort needs to be made to integrate the results from this analysis here but also with previous work and the data from other organisms, which at the moment sits in a standalone section at the end of the discussion.

      We have improved the discussion and removed the previous supplementary movies 2 and 3. Supplementary movie 1 is now mentioned in the results section.

      Line 693 - There was an interesting point about dividing cells describing the maintenance of endosomes next to the old pocket. Does that mean there was no endosome by the new pocket and if so where is this data in the manuscript? This point relates back to my question about how cells were chosen for analysis - how many dividing cells were examined by tomography?

      The fate of endosomes during the cell cycle is not the subject of this paper. In this manuscript we only show only one dividing cell using tomography. An in-depth analysis focusing on what happens during the cell cycle will be published separately.

      Line 729 - I'm unclear how this represents a polarization of function in the flagellar pocket. The pocket I presume is included within the endosomal system for this analysis but there was no specific mention of it in the results and no marker of each position to help define any specialisation. From the results, I thought the focus was on endosomal co-localisation of the different markers. If the authors are thinking about specialisation of the pocket this paper from Mark Field shows there is evidence for the exocyst to be distributed over the entire surface of the pocket, which is relevant to the discussion here. Boehm, C.M. et al. (2017) The trypanosome exocyst: a conserved structure revealing a new role in endocytosis. PLoS Pathog. 13, e1006063

      We have formulated our statement more cautiously. However, we are convinced that membrane exchange cannot physically work without functional polarization of the pocket. We know that Rab11, for example, is not evenly distributed on the pocket. By the way, in Boehm et al. (2017) the exocyst is not shown to cover the entire pocket (as shown in Supplementary Video 1).

      We now refer to Boehm et al. (Lines 700 – 703):

      “Boehm et al (2017) report that in the flagellar pocket endocytic and exocytic sites are in close proximity but do not overlap. We further suggest that the fusion of EXCs with the flagellar pocket membrane and clathrin-mediated endocytosis take place on different sites of the pocket. This disparity explains the lower colocalization between TbRab11 and TbRab5A.”

      Line 735 - link to data not previously mentioned I think. When I looked at this data I couldn't find a key to explain what all the different colours related to.

      We have removed the previous supplementary movies 2 and 3. We now reference supplementary movie 1 in the results section.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Du et al. address the cell cycle-dependent clearance of misfolded protein aggregates mediated by the endoplasmic reticulum (ER) associated Hsp70 chaperone family and ER reorganisation. The observations are interesting and impactful to the field.

      Strength:

      The manuscript addresses the connection between the clearance of misfolded protein aggregates and the cell cycle using a proteostasis reporter targeted to ER in multiple cell lines. Through imaging and some biochemical assays, they establish the role of BiP, an

      Hsp70 family chaperone, and Cdk1 inactivation in aggregate clearance upon mitotic exit.

      Furthermore, the authors present an initial analysis of the role of ER reorganisation in this clearance. These are important correlations and could have implications for ageingassociated pathologies. Overall, the results are convincing and impactful to the field.

      Weakness:

      The manuscript still lacks a mechanistic understanding of aggregate clearance. Even though the authors have provided the role of different cellular components, such as BiP, Cdk1 and ATL2/3 through specific inhibitors, at least an outline establishing the sequence of events leading to clearance is missing. Moreover, the authors show that the levels of ERFlucDM-eGFP do not change significantly throughout the cell cycle, indicating that protein degradation is not in play. Therefore, addressing/elaborating on the mechanism of disassembly can add value to the work. Also, the physiological relevance of aggregate clearance upon mitotic exit has not been tested, nor have the cellular targets of this mode of clearance been identified or discussed.

      Thank you for your suggestions. 

      We have added descriptions about the sequence of events leading to clearance in the abstract (line 33) and discussion (line 316). 

      We have commented on the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically. Furthermore, we have commented about the potential defects that could arise in cells expressing ER-FlucDM-eGFP that have a perturbed cellular health based on the proteomic analysis (line 359). 

      To identify pathological targets that undergo clearance as the ER-FlucDM-eGFP, we tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have also commented in the discussion that pathological targets are yet to be identified and could be a part of future work (line 392).

      Reviewer #2 (Public review):

      This paper describes an interesting observation that ER-targeted misfolded proteins are trapped within vesicles inside nucleus to facilitate quality control during cell division. This work supports the concept that transient sequestration of misfolded proteins is a fundamental mechanism of protein quality control. The authors satisfactorily addressed several points asked in the review of first submission. The manuscript is improved but still unable to fully address the mechanisms.

      Strengths:

      The observations in this manuscript are very interesting and open up many questions on proteostasis biology.

      Weaknesses:

      Despite inclusions of several protein-level experiments, the manuscript remained a microscopy-driven work and missed the opportunity to work out the mechanisms behind the observations.

      Thank you for your suggestions. We believe that our study has provided a genetic basis for the involvement of ER reorganization and BiP during cell division in aggregate clearance, which is a new observation. We have also commented in this revised manuscript about the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388).  

      Reviewer #3 (Public review):

      This paper describes a new mechanism for the clearance of protein aggregates associated to endoplasmic reticulum re-organization that occurs during mitosis.

      Experimental data showing clearance of protein aggregates during mitosis is solid, statistically significant, and very interesting. The authors made several new experiments included in the revised version to address the concerns raised by reviewers. A new proteomic analysis, co-localization of the aggregates with the ER membrane Sec61beta protein, expression of the aggregate-prone protein in the nucleus does not result in accumulation of aggregates, detection of protein aggregates in the insoluble faction after cell disruption and mostly importantly knockdown of ATL proteins involved in the organization of ER shape and structure impaired the clearance mechanism. This last observation addresses one of the weakest points of the original version which was the lack of experimental correlation between ER structure capability to re-shape and the clearance mechanism.

      In conclusion, this new mechanism of protein aggregate clearance from the ER was not completely understood in this work but the manuscript presented, particularly in the revised version, an ensemble of solid observations and mechanistic information to scaffold future studies that clarify more details of this mechanism. As stated by the authors: "How protein aggregates are targeted and assembled into the intranuclear membranous structure waits for future investigation". This new mechanism of aggregate clearance from the ER is not expected to be fully understood in a single work but this paper may constitute one step to better comprehend the cell capability to resolve protein aggregates in different cell compartments.

      We thank the reviewer for the comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents a very interesting set of observations that could have significant implications on age-related protein misfolding and aggregate clearance. There are a few places in the manuscript that still need more clarity. Some are listed below, which I think can improve the manuscript.

      - The new data associated with proteomic analysis is appreciated, but the information gained has not been explored or elaborated sufficiently in the manuscript. Based on the differential expression of cell cycle proteins, how the authors interpret cellular health is unclear. Also, the physiological role of this mode of aggregate clearance remains unclear.

      We have added our interpretation of perturbed cellular health in cells expressing ERFlucDM-eGFP in the discussion (line 359). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically.

      - In Figure 3A, have the authors measured the total GFP intensity from interphase through early G1? Even though the number and area of the aggregates decrease significantly, the cytoplasmic GFP signal does not seem to increase. Considering new CHX chase experiments and total Fluorescence intensity calculations (Figure S7D), which indicate no difference, one would expect an increase in cytoplasmic signal upon the disassembly of aggregates. Therefore, the data from Figures 3A and 7D seem contradictory. Can the authors please explain?

      We apologized for the confusion. The images in Figure 3A were derived from fixed cells. So, different cells were shown in every cell cycle phases and were not suitable for quantification. Fluorescence intensity changes could be better appreciated in Figure 3C or 4D as these were time-lapse microscopy images of live cells progressing through mitosis and cytokinesis. Data used in the quantification of fluorescence intensity in Figure S7D were derived from live cells taken from specific time points to avoid unwanted fluorescence bleaching during time-lapse microscopy. 

      - Do the authors expect a similar clearance of pathological aggregates such as mutant FUS or TDP43 condensates? Showing aggregate disassembly of disease-relevant aggregates would be an excellent addition to the manuscript, but it might be beyond the scope of the current version. However, the authors can comment/speculate how their study might extend to pathological condensates.

      We tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have commented that pathological targets are yet to be identified and could be a part of future work (line 392).

      - The presence of ER membrane around these aggregates is an interesting observation. This membrane is retained even after nuclear membrane breakdown. What could be the relevance of membrane-bound aggregates, especially since the membrane can limit the access of chaperones involved in disassembly? This observation becomes more important since the depletion of ER membrane fusion proteins also leads to the accumulation of aggregates. Are the membranes a beacon for disassembly? The authors may comment/ speculate. This could also be an important aspect of the mechanism of clearance.

      We think that the ER membranes around the aggregates are disassembled when the ER networks reorganize during mitotic exit and this may allow accessibility of BiP to disaggregate the aggregates. We have added this in the discussion (line 316).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary In this work, the authors present a careful study of the lattice of the indirect flight muscle (IFM) in Drosophila using data from a morphometric analysis. To this end, an automated tool is developed for precise, high-throughput measurements of sarcomere length and myofibril width, and various microscopy techniques are used to assess sub-sarcomeric structures. These methods are applied to analyze sarcomere structure at multiple stages in the process of myofibrillogenesis. In addition, the authors present various factors and experimental methods that may affect the accurate measurement of IFM structures. Although the comprehensive structural study is appreciated, there are major issues with the presentation/scope of the work that need to be addressed: Major Comments 1. The main weakness of the paper is in its claim of presenting a model of the sarcomere. Indeed, the paper reports a structural study that is drawn onto a 3D schematic. There is no myofibrillogenesis model that would provide insights into mechanisms. Therefore, the use of the word model is grossly overstated.

      In biology, the term “model” is used in various contexts, but it generally refers to a simplified representation of a biological system, a structure or a process. Accordingly, we consider “model” the most fitting phrase for what we present in Figure 4 (Figure 7 in the revised manuscript). These are not arbitrary 3D schematics; they are scaled representations in which the length, the number and the relative three-dimensional arrangement of thin and thick filaments are based on measurements. These measurements are primarily based on our own data (presented in the main text and provided in the supplementary materials), as published data were either lacking or inconsistent. Moreover, we would like to highlight that we do not claim to present a conceptual or mechanistic model of myofibrillogenesis, but we do present structural reconstructions or models for four developmental time points. Therefore, we disagree with the remark that “the use of the word model is grossly overstated”, as our wording fully corresponds to the common sense.

      In general, the major focus and contribution of the work is unclear. How does the comprehensive nature of the measurements contribute to existing literature?

      We significantly revised the text to highlight the main points more firmly, and added an additional section to help non-specialist readers to better understand our aims and findings.

      Figure labels are often rather confusing - for example it is unclear why there is a B, B', B' etc instead of B,C,D, etc.

      The figure labels have been revised in accordance with the reviewer’s recommendation.

      Some comments in the text are not clearly tied to the figures. For example, in lines 108-109, are the authors referring to the shadow along the edges of the myofibril when saying they are not clearly defined (Figure 1C)?

      The lines refer to the fact that identifying the boundary of an “object” in a fluorescence microscopy image is inherently challenging - even under ideal conditions where the object’s image is not affected by nearby signals or background noise. To improve clarity, we revised this section and now it reads: The other key parameter - myofibril diameter - is typically measured using phalloidin staining. However, accurately delineating their boundaries in micrographs is difficult - even under optimal conditions (high signal‑to‑noise ratio, no overlapping fibers, etc.; Fig. 1C). This limitation arises from the fundamental nature of light microscopy as the image produced is a blurred version of the actual structure, due to convolution with the microscope’s point spread function.

      In line 116, it is unclear what "surrounding structures" the authors are referring to if the myofibrils are isolated.

      We revised the text for clarity. It now states: Once isolated, myofibrils lie flat on the coverslip, aligning with the focal plane of the objective lens. This orientation allows for high-resolution, undistorted imaging and accurate two-dimensional measurements, free from interference by neighboring biological structures (e.g.: other myofibrils).

      In lines 141-142, there is no reference of data to back up the claim of validation.

      We addressed this mistake by including a reference to Fig. S1E (Fig. S1D in the revised manuscript).

      In line 170, the authors mention the mef2-Gal4/+ strain as a Gal4 driver line but do not clearly state how this strain is different from the wildtypes or how this impacts their results.

      Mef2-Gal4 is a muscle-specific Gal4 driver, often used in Drosophila muscle studies. It is a convention between Drosophila geneticists that presence of a transgene (i.e. Mef2-Gal4) changes the genetic background, and although it does not necessariliy cause any phenotypic effect, it is clearly distinguished from the wild type situation, and whenever relevant, Mef2-Gal4/+ is the preferred choice (if not the correct choice) as a control instead of wild type. As clear from our data, presence of the Mef2-Gal4 driver line does not affect the length or width of IFM sarcomeres as compared to wild type.

      In lines 182-185, the authors discuss the effects of tissue embedding on morphometrics. Were factors such as animal sex, age, fiber type, etc. conserved in these experiments? If not, any differences in results may be confounding.

      We fully agree with the reviewer that when testing the effect of a single variable, all other variables should remain constant. This is actually one of the main points emphasized in the results section. Additionally, this information is already provided in the Source Data files for each panel.

      In lines 199-201, the authors discuss results of myofibril diameter using different preparation methods, yet no data is cited to support the claims. In line 220, the phrase "6 independent experiments" is unclear. Is each independent experiment performed using a different animal? Furthermore, are 6 experiments performed for each time point?

      We substantially revised the relevant paragraphs and ensured that the corresponding data (Figure 2A in the revised manuscript) is cited each time when it is discussed. We conducted six independent experiments at each time point. This is consistently indicated in the figures and can be verified in the SourceData files (specifically, Fig3SourceData in this case). To clarify what we mean by "independent experiments," we added the following sentence to the Methods section: Experiments were considered independent when specimens came from different parental crosses, and each experiment included approximately six animals to capture individual variability.

      In line 254, the authors refer to "number of sarcomeres". It must be clearly stated if this refers to sarcomeres per myofibril, image area, etc.

      It is now clearly stated as: "number of sarcomeres per myofibril".

      In line 274, the authors refer to "myofilament number". It must be clearly stated if this refers to myofilaments per myofibril, image area, etc.

      We counted the number of myofilaments in developing myofibrils, and this is now clearly stated in the text and in the legend of Figure 3 (Figure 4 in the revised manuscript).

      In line 299, the authors mention that thin filaments measured less than 560 nm in length, yet no data is cited to support this.

      The previously missing reference to Figure 4 (Figure 7 in the revised manuscript) has now been added in addition to the revised Supplementary Figure 5.

      In the "Quantifying sarcomere growth dynamics" section of the summary (starting from line 402) the authors introduce data that would be more naturally placed in the results and discussion section.

      As suggested by the reviewer, we incorporated the key aspects of sarcomere growth dynamics into the Results and Discussion section.

      In lines 422-423, it is not mentioned what the controls are for.

      This was already explained in the main text between lines 167 and 173.

      In the caption of Figure 1C, it is not mentioned what the red dashed lines in the microscope images represent.

      The caption has been updated to include the following clarification: The red dashed lines border the ROI used for generating the intensity profiles.

      In the caption of Figure 1D, the difference between the lighter and darker grey points is not mentioned.

      This was already explained in each relevant figure legend. In this specific case, it is stated between lines 850 and 852: “Light gray dots represent individual measurements of sarcomere length and myofibril diameter, while the larger dots indicate the mean values from independent experiments.”

      In line 849, the stated p-value (0.003) does not match that mentioned in the figure (0.0003).

      We thank the reviewer for noticing this small mistake; correction was made to display the accurate p-value of 0.0003 at both places.

      In line 874, it is not clear what an "independent experiment" refers to (different animal, etc.?).

      We refer the reviewer to point 9, where this question has already been addressed.

      Figure 2A is hard to read. Using different colored dots for different time points might help.

      As suggested by the reviewer, we generated a plot with the individual points color-coded by time.

      The significant figures presented in Figure 4 give a completely inaccurate representation of the variability of the measurements achieved with these techniques.

      Certainly, each measured parameter exhibits inherent biological and technical variability. We have made all the raw data available to the reader through the SourceData files, and this variability is also evident in Figures 1, 2, 3, Supplementary Figure 1, 3, and 5 (Figure 1, 2, 3, 4, 6, and Supplementary Figure 1 in the revised manuscript). Also we have included an additional plot (Supplementary Figure 5 in the revised manuscript) that presents the calculated thin and thick filament lengths and their uncertainty. However, in Figure 4 (Figure 7 in the revised manuscript), our goal was to present an easily understandable visual representation of the sarcomeric structures for each time point, based on the averages of the relevant measurements.

      In line 877, it should be mentioned that the number of filaments is counted per myofibril. The y-axes in the figure should also be adjusted to clarify this.

      As suggested by the reviewer, both the figure legend and the plot have been updated to clearly indicate that the filament count refers to the number per myofibril.

      In line 883, it is not clear what an "independent experiment" refers to (different animal, etc.?).

      We refer the reviewer to point 9, where this question has already been addressed.

      The statement of sample sizes in all figures is a little confusing.

      Following general guidelines, we used SuperPlots to effectively present the data, as nicely demonstrated in the JCB viewpoint article by Lord et al., 2020 (PMID: 32346721). Individual measurements are shown as pooled data points, allowing readers to appreciate the spread, distribution and number of measurements. Overlaid on these pooled dot plots are the mean values from each independent experiment, with error bars representing variability between independent experiments. Sample sizes are provided for both individual measurements and independent experiments. This is now clearly explained in the Materials and Methods section, and we corrected the legends to improve clarity (“n” indicates the number of independent experiments/individual measurements).

      In lines 1007-1008, the authors imply that the lattice model is needed for calculation of myofilament length. However, from the equations and previous data, it seems that this can be estimated using the confocal and dSTORM images.

      As the reviewer correctly noted, myofilament length can be estimated using measurements from confocal and dSTORM images, following the equations provided. However, constructing even a simplified model requires multiple constraints to be defined and applied in a specific order. In practice, one must first determine the number and arrangement of myofilaments in a cross-sectional view of an “average sarcomere” before attempting to build a longitudinal model, where length calculations become relevant. This is now clarified in the text.

      A more specific discussion of future directions is needed to put this paper in context. For example: Can anything from the overall process be used to better understand sarcomere dynamics in larger animals/humans? Can this be applied to disease modelling?

      To address these questions, we have added a section titled STUDY LIMITATIONS, which states: “Our study is focused on describing the growth of IFM sarcomeres during myofibrillogenesis at the level of individual myofilaments. Additionally, we developed a user-friendly software tool for precise sarcomere size measurements and demonstrate that these measurements are sensitive to varying conditions. Whereas, this tool can be used successfully on whole muscle fiber preparations as well, our pipeline was intentionally optimized for individual IFM myofibrils ensuring higher measurement precision in our hands than other type of preparations. Thus, we predict that future work will be required to extend it to sarcomeres from other muscle tissues or species. Nevertheless, our study exemplifies a workflow how to measure sarcomere dimensions precisely. With some variations, it should be possible to adopt it for other muscles, including vertebrate and human striated muscles. To facilitate this and to enhance the accessibility and usability of this dataset, we welcome any feedback and suggestions from researchers in the field.”

      One of the major claims of the paper is that there is a measurable variability with sex and other parameters. However, this data is never clearly summarized, presented (except for supplement), or discussed for its implications.

      We followed the suggestion of the reviewer, and we moved this supplementary data into a main figure, and thoroughly revised the corresponding paragraphs to present and discuss the findings more clearly.

      Minor Comments: 1. Lines 60-65 seem to break the flow of the introduction. As the authors discuss existing methods in literature for IFM analysis in the previous couple sentences, the following sentences should clearly state the limitations of existing methods/current gap in literature and a general idea of what the current work is contributing.

      We agree with this remark, and we substantially revised the Introduction to clearly define the existing gap in the literature and to articulate how our work addresses this gap.

      In line 104, the acronym for ZASPs is not spelled out.

      The acronym has now been spelled out for clarity.

      **Referee Cross-commenting**

      I agree as well.

      Reviewer #1 (Significance (Required)):

      In summary, this paper provides a multi-scale characterization of Drosophila flight muscle sarcomere structure under a variety of conditions, which is potentially a significant contribution for the field. However, the paper scope is overstated in that it does not provide an actual sarcomere model. Further, there are multiple issues with data presentation that impact the readability of the manuscript.

      Although it is somewhat unclear what would be “an actual sarcomere model” for the reviewer, but we cannot accept that we made on overstatement by using the word “model”, because one of the main outcomes of our work are indeed the myofilament level sarcomere models depicted in Figure 4 (Figure 7 in the revised manuscript). As said above, we do not claim that these would be molecular models, or mechanistic models or developmental models, but it makes absolutely nonsense (even in common terms!) that our scaled graphical representations (based on a wealth of measurements) should not be or cannot be called models.

      As to the comment with data presentation, we thank the reviewer for the numerous suggestions, and we substantially revised the manuscript to increase clarity and overall readability.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Summary: In this manuscript titled "A myofilament lattice model of Drosophila flight muscle sarcomeres based on multiscale morphometric analysis during development," Görög et al. perform a detailed analysis of morphological parameters of the indirect flight muscle (IFM) of D. melanogaster. The authors start by illustrating the range of measurements reported in the literature for mature IFM sarcomere length and width, showing a need to revisit and determine a standardized measurement. They develop a new Python-based tool, IMA, to analyze sarcomere lengths from confocal micrographs of isolated myofibrils stained with phalloidin and a z-disc marker. Using this tool, they demonstrate that sample preparation (especially mounting medium), as well as fiber type, sex, and age influence sarcomere measurements. Combining IMA, TEM, and STORM data, they measure sarcomere parameters across development, providing a comprehensive and up-to-date set of "standardized" sarcomere measurements. Using these data, they generate a model integrating all of the parameters to model sarcomeres at four discrete timepoints of development, recapitulating key phases of sarcomere formation and growth.

      Major comments: Line 200 & 901 - Figure S1B - The authors make a strong statement about the use of liquid versus hardening media, and it is clear from the image provided in Figure S1 that there is a difference in the apparent sarcomere width. The identity of the "liquid media" versus the "hardening media" should be clearly identified in the Results, in addition to the legend for Figure S1. The authors show that "glycerol-based solutions" increase sarcomere width, but the Materials only list 90% glycerol and PBS. However, a frequently used liquid mounting media is Vectashield. Based on the literature, measurements in liquid Vectashield show diameters significantly less than 2.2 microns observed here with presumably 90% glycerol or PBS. Can the authors qualify this statement, or provide data that all forms of liquid mounting media cause this effect? Does this also apply to hemi-thorax and sectioned preparations, or just isolated myofibrils?

      We used a PBS-based solution containing 90% glycerol as our liquid medium, as now stated in the main text. In response to the reviewer’s suggestion, we also tested a non-hardening version of Vectashield (H-1000). Myofibrils in Vectashield were significantly thicker than those in ProLong Gold but still thinner than those in the 90% glycerol–PBS solution, shown in Figure 2B. The mechanisms that could potentially explain these observations have been described in several studies (Miller et al., 2008; Tanner et al., 2011, 2012). Briefly, IFM is a densely packed macromolecular assembly. Upon removal of the cell membrane, myofibrillar proteins attract water, leading to overhydration of the myofilament lattice. This increases the spacing between filaments, resulting in an expansion of overall myofibril diameter. The extent of hydration depends on the osmolarity of the surrounding medium, as the system eventually reaches osmotic equilibrium. While both liquid media induced significant swelling, the observed differences likely reflect variations in their osmotic properties. In contrast, dehydration - an essential step in electron microscopy sample preparation - reduces the spacing between filaments, making myofibrils appear thinner. This explains why EM micrographs consistently show significantly smaller myofibril diameters (Chakravorty et al., 2017).

              Hardening media such as ProLong Gold introduce additional artifacts: during polymerization, these media shrink, exerting compressive forces on the tissue (Jonkman et al., 2020). We therefore propose that isolated myofibrils first expand due to overhydration in the dissection solution, and are then compressed back toward their *in vivo* dimensions during incubation in ProLong Gold. The average *in vivo* diameter of IFM myofibrils can be estimated without direct measurements, as it is determined by two key factors: (i) the number of myofilaments, which has been quantified in EM cross-sections in several studies (Fernandes & Schöck, 2014; Shwartz et al., 2016; Chakravorty et al., 2017) including our own, and (ii) the spacing between filaments, which can be measured by X-ray diffraction even in live *Drosophila* or under various experimental conditions (Irving & Maughan, 2000; Miller et al., 2008; Tanner et al., 2011, 2012). Our findings suggest that the effects of lattice overhydration and media-induced shrinkage are most pronounced in isolated myofibrils. In larger tissue preparations, the inter-myofibrillar space likely acts as a mechanical and osmotic buffer, reducing the extent of such distortions
      

      Can the authors comment on whether the length of fixation or fixation buffer solution, in addition to the mounting medium, make a difference on sarcomere length and diameter measurements? This is another source of variation in published protocols.

      The effect of fixation time on sarcomere morphometrics in whole-mount IFM preparations has been previously demonstrated by DeAguero et al. (2019), as briefly noted in our manuscript. To extend these findings, we performed a comparison using isolated myofibrils, assessing morphometric parameters after fixation for 10, 20 (standard) and 60 minutes. We found no difference between the 10- and 20-minute fixation conditions; however, fixation for 60 minutes resulted in significantly increased myofibril diameter (and these data are now shown in Supplementary Figure 1C). A comparable increase in thickness was also observed when using a glutaraldehyde-based fixative. These results suggest that more extensively fixed myofibrils may better resist the compressive forces exerted by hardening media.

      Line 237-238. The authors conclude that premyofibrils are much thinner than previously measured. The use of Airyscan to more accurately measure myofibril width at this timepoint is a good contribution, as indeed diffraction and light scatter likely contribute to increased width measured in light microscopy images. I also wonder, though, how well the IMP software performs in measuring width at 36h APF, given how irregular the isolated myofibrils at this stage look (wide z-lines but thinner and weaker H and I bands as shown in Fig. 2B)?

      The reviewer is correct that measurements during the early stages of myofibrillogenesis require additional effort. However, in addition to its automatic mode, IMA can also operate in semi-automatic or manual modes, ensuring complete control over the measurements. Myofibril width is determined from the phalloidin channel at the Z-line (as described in the software’s User Guide and Supplementary Figure 2), where it is at its thickest.

      Also, how much of the difference in sarcomere width arises due to effects of "stripping" components off of the sarcomere at the earliest timepoint (for example alpha-actinin or Zasp proteins)?

      A comparison between isolated myofibrils and those from microdissected muscles (Supplementary Figure 3B, Figure 3C in the revised manuscript) shows that the isolation process does not alter the morphometric measurements of sarcomeres. Moreover, the measured myofibril width aligns well with what we expect based on the number of myofilaments observed in TEM cross-sections of myofibrils at 36 hours APF (Figure 3A, now Figure 4A in the revised manuscript), supporting the consistency of our model.

      Myofibrils at early timepoints do contain more than 4-12 sarcomeres in a line (they extend the full length of the myofiber), so it is possible they are breaking due to the detergent and mechanical disruption induced by the isolation method.

      The reviewer is correct - myofibrils likely span the full length of the myofiber from the onset of myofibrillogenesis. However, during the isolation of individual myofibrils, they often break, and even mature myofibrils typically fragment into pieces of about 300 µm in length (illustrated in Figure 1E, now Figure 2A in the revised manuscript). Importantly, our measurements show that this fragmentation does not affect the assessed sarcomere length or width (as shown in Supplementary Figure 3B, now Figure 3C in the revised manuscript).

      Line 312 - What does "stable association" mean in this context? The authors mention early timepoints lack stable association of alpha-Actinin or Zasp52, and they reference Fig. S4C, but this figure only shows 72h and 24 AE, not 36h and 48 h APF. Previous reports have seen localization of both alpha-Actinin and Zasp52, so presumably the detergent or mechanical isolation is stripping these components off of the isolated myofibrils up until 72h.

      In agreement with previous reports, we also detected both α-Actinin (as shown in former Supplementary Figure 3B, now Figure 3C) and Zasp52 in microdissected IFM starting from 36 hours APF. However, these markers were largely absent from the isolated myofibrils of young pupae (36 to 60 hours APF). By 60 hours APF, strong α-Actinin and Zasp52 staining became evident in isolated myofibrils, whereas dTitin epitopes were clearly detectable from the earliest time point examined. This indicates that some proteins, such as α-Actinin and Zasp52, can be lost during the isolation process, whereas others like dTitin are retained and this differential sensitivity appears to depend on developmental stage. A likely explanation is that α-Actinin and Zasp52 are recruited early to Z-bodies but are only fully incorporated as more mature Z-disks form between 48 and 60 hours APF. This incomplete incorporation at the earlier stages could account for their loss during the isolation process. This interpretation is supported by our morphological analysis of the Z-discs, as shown in the dSTORM dataset (former Figure 3B, B’’, now Figure 4C, E) and in longitudinal TEM sections (former Supplementary Figure 5B, now in Figure 6B). Because α-Actinin and Zasp52 are not detected in isolated myofibrils at 36 and 48 hours APF, they are not included in Figure S4C (Figure 5C in the revised manuscript). This is explained in the updated figure legend.

      This same type of issue comes up again in Lines 325-334, where the authors talk about 3E8 and MAC147. They state that 3E8 signal significantly declines in later stages and that MAC147 is not suitable to label myofibrils in young pupae, but they only show data from 72 APF and 24 AE (which looks to have decent staining for both 3E8 and MAC147). A clearer explanation here would be helpful.

      To put it simply: we used one myosin antibody to label the A-band in the IFM of 36h APF and 48h APF animals, and a different antibody for the 72h APF and 24h AE stages. In more detail: Myosin 3E8 is a monoclonal antibody targeting the myosin heavy chain and labels the entire length of mature thick filaments except for the bare zone (former Supplementary Figure 4D, now in Figure 5D), suggesting its epitope is near the head domain. As a result, we expect a uniform A-band staining - excluding the bare zone - which is exactly what we observe in the IFM of young pupae (36h APF and 48h APF; formerly Figure 3B, now Figure 4C in the revised manuscript). However, at 72h APF and 24h AE, Myosin 3E8 produces a different staining pattern: two narrow stripes flanking the bare zone and two broader, more diffuse stripes near the A/I band junction (former Supplementary Figure 4D, now Figure 5D). This change is likely due to restricted antigen accessibility at these later developmental stages - a common issue in the densely packed IFM - making this antibody unsuitable for reliably measuring thick filament length in these stages.

      MAC147 is another monoclonal antibody against Mhc that recognizes an epitope near the head domain. However, it only works reliably in more mature myofibrils (72h APF and 24h AE; formerly Figure 3B, now Figure 4C in the revised manuscript), likely due to its specificity for a particular Mhc isoform. This is why we do not include images from earlier developmental stages using this antibody. We added a revised, concise explanation in the main text for general readers, and provided a more detailed description for specialist readers in the legend of Supplementary Figure 4D (updated as Figure 5D in the revised manuscript).

      Figure 3B. The authors show the H, Z, and I lengths in B', B', and B' and discuss these lengths in the text (lines 305-320). It would also be nice to actually have the plots showing the measured/calculated lengths for thin and thick filaments. These are mentioned in the results, but I cannot find the plots in the figures and there is no panel reference.

      A summary table of the measured and calculated parameters is provided in Fig4SourceData (Fig7Source Data in the revised manuscript). However, following the reviewer’s suggestion, we also generated an additional plot (Supplementary Figure 5 in the revised manuscript) that displays the calculated thin and thick filament lengths.

      Line 400. Does the model in Figure 4 actually have molecular resolution as the authors claim? From these views, thick and thin filaments appear to be represented by cylindrical objects. Localization of specific molecules would require further modeling with individual proteins. Or do the authors mean localization from STORM imaging relative to the ends of the thick and/or thin filaments? The model itself is a useful contribution, but based on Figure 4, resolution of individual molecules is not evident.

      The reviewer is correct; and we fully agree that we do not present a molecular model of sarcomeres in this study - nor do we claim to. Instead we present a myofilament level model. Nevertheless, the scaled myofilament lattice model we introduce could serve as a geometric constraint when constructing supramolecular models of sarcomeres. As the reviewer rightly notes, implementing such an approach would require additional effort.

      The main Results section of the text is condensed into 4 figures. However, I found myself flipping back and forth between the main figures and the supplement continuously, especially parts of Supplemental Figures 1, 3, 4, and 5. With such large amounts of detail in the Results relying on the supplement, it may be worth considering reorganizing the main and supplemental figures, and having 7 main figures, to include important panels that are currently in the supplement (esp. Fig S1B, S1C, S1D, S3B, S4, S5).

      We found it a very useful suggestion, and we substantially reorganized the figures in the revised manuscript according to the recommendations of the reviewer.

      Minor comments: On the plots in Fig. S1B, D, and F, it is hard to see the color of the dots because the red error bars are on top of them. Can the other distribution dots be tinted the correct color or the x-axis labels be added, so it is clear which dataset is which?

      We significantly enlarged the dots to enhance visual clarity.

      Line 142 needs a reference to Figure S1, Panel E, which shows the accuracy and precision measurements.

      The requested panel reference has now been included in the revised manuscript.

      Lines 198 - is this range from the above publications? Needs to be clearly cited.

      The range has indeed been estimated using measurements from the aforementioned publications, and this point is now further clarified in the revised text.

      Figure S3B is confusing - why do the blow-ups overlap both the top (presumably microdissected) and the bottom (presumably isolated) images? The identity of microdissected images should be labeled, as they are hard to see underneath of the blown-up images and the identity of individual image planes wasn't immediately obvious.

      We refined the panel structure of Figure S3B (Figure 3C in the revised manuscript) to enhance clarity as the reviewer suggested.

      Line 298. By "misaligned," do the authors mean the pointed ends are not uniformly anchored in the z-disc, leading to the wide z-disc measurements? At this early stage, I'm not sure "misaligned" is the right word - perhaps "were not yet aligned in register at the z-disc" or something similar.

      We revised the text for clarity. It now reads: At 36 hours APF, thin filaments had not yet aligned in perfect register at the Z-disc, with most measuring less than 560 nm in length - and exhibiting considerable variability.

      Figure S6 - spelling mistake in label of panel A, "sarcomer" should be "sarcomere"

      The typo is corrected.

      Line 487. Spelling "Zaps52" should be "Zasp52"

      The typo is corrected.

      Line 887. Spelling "Myofilement" should be "Myofilament"

      The typo is corrected.

      Line 946-947. In the legend for Supp. Fig. 3., the authors should specify which published datasets on sarcomere length are shown in the figure by including the references in the legend. Presumably the "isolated individual myofibrils" are the blue "this study" lines, leaving the "microdissected muscles" as the magenta "previous reports" on the figure. Without the reference, it is not clear if these are microdissected, isolated myofibrils, hemi-thorax sections, cryosections, or another preparation method for the "previous reports" data.

      The references have now been added to both the figure and its legend.

      **Referee Cross-commenting**

      I agree with the comments from the other reviewers. Many of the major themes are consistent across the reviews, including regarding the model, preparation methods, and the software tool.

      Reviewer #2 (Significance (Required)):

      Strengths: This manuscript is an important contribution to the field of sarcomere development. The authors use modern technologies to revisit variation in morphometric measurements in the literature, and they identify parameters that influence this variation. Notably, sex-specific differences, DLM versus DVM measurements, and mouting media are potential contributors to the variability. Combining TEM and STORM with a confocal timecourse of isolated myofibrils, they refine previously published values of sarcomere length and width, and add more comprehensive data for filament length, number and spacing. This highly accurate timecourse demonstrates continual growth of sarcomeres after 48 h APF, and correct some inconsistencies from previous large-scale timecourse datasets. These data are very valuable to the field, especially Drosophila muscle biologists, and will serve as a comparative resource for future studies. Weaknesses: At early timepoints, loss of sarcomere components through mechanical or detergent-mediated artifacts may influence the authors' measurements. In addition, isolating myofibrils is not always the most ideal approach, as it loses information on myofiber structure as well as organization and structure of the myofibrils in vivo.

      We believe that the control experiments we presented here adequately demonstrate that sarcomere measurements are not affected by the myofibril isolation process at early timepoints (Figure 3C). Nevertheless, we certainly agree with the reviewer that isolated myofibrils alone cannot capture the entire complexity of muscle tissues, and additional approaches should also be applied in complex projects. Yet, we are confident that our approach offers the most reliable and efficient method for precise morphometric analysis of the sarcomeres, and although alone it is very unlikely to be sufficient to address all questions of a muscle development project, it can still be applied as a very useful and robust tool.

      The point regarding liquid versus hardening mounting media is valuable, but remains to be tested and validated with the diverse liquid and hardening media used by other labs.

      Whereas it would not be feasible for us to test all possible liquid and hardening media used by others in all possible conditions, we tested the effect of Vectashield (the most commonly used liquid media) according to the suggestion of the reviewer, and the results are now included in the manuscript. We think that this is a valuable extension of the list of the materials and conditions we tested, although we need to point out that our primary goal was not necessarily to test as many conditions as possible (because the number of those conditions is virtually endless), rather to raise awareness among colleagues that these variables can significantly impact the data obtained and affect their comparability.

      The IMA software seems to be designed specifically for analysis of isolated myofibrils, and it is unclear if it would work for other types of IFM preparations.

      As stated in the manuscript, IMA is a specialized tool designed for the analysis of individual myofibrils. While it can also process other types of IFM preparations in semi-automatic or manual modes, we believe these approaches compromise both efficiency and accuracy. This is further clarified in the revised manuscript.

      A last point is that TEM and STORM may not be available on a regular basis to many labs, hindering wide implementation of the approach used in this manuscript to generate very accurate and detailed measurements of sarcomere morphometrics.

      Regarding the availability of TEM and STORM, we acknowledge that these techniques are not universally accessible. However, that is exactly one major value of our work that our open-source software tool now allows researchers to generate valuable data using only a confocal microscope in combination with our published datasets.

      Audience: Scientists who study sarcomerogenesis or Drosophila muscle biology.

      My expertise: I study muscle development in the Drosophila model.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ Summary: This manuscripts presents a computational tool to quantify sarcomere length and myofibril width of the Drosophila indirect flight muscles, including developmental samples. This tool was applied to confocal and STORM super-resolution images of isolated myofibrils from adult and developing flight muscles. Thick filament numbers per myofibril were counted during development of flight muscles. A myofilament model of developing flight muscle myofibrils is presented that remains speculative for the early developmental stages.

      Major comments: 1. The title of the manuscript appears unclear. What is a lattice model? Lattice is an ordered array. The filament array parameters for mature flight muscles was aready measured. It appears that the authors speculate how this order might be generated during sarcomere assembly, which is not studied in this manuscript as it is limited to periodic arrays after 36h APF.

      As the reviewer correctly points out, a lattice refers to an ordered array - in the case of IFM sarcomeres, this includes both thin and thick filaments. Therefore, the phrase "myofilament lattice model of Drosophila flight muscle sarcomeres" specifically describes a model representing the spatial organization of these filament arrays within the sarcomere. To provide additional clarity for readers, we have revised the title to include more context. It now reads: Developmental Remodeling of Drosophila Flight Muscle Sarcomeres: A Scaled Myofilament Lattice Model Based on Multiscale Morphometrics

      To create a model of these arrays, three essential pieces of information are required:

      1) The length of the filaments,

      2) The number of filaments, and

      3) The relative position of the filaments.

      While some direct measurements are available in the literature, and others can be used to calculate the necessary values, available data is often contradictory or simply different from each other (as described in our ms) making them unsuitable for constructing scaled models of the myofilament arrays. In contrast to that, here we present a comprehensive and consistent set of measurements that enabled us to build models not only of mature sarcomeres but also of sarcomeres at three other significant developmental time points.

      Regarding the mention of "sarcomere assembly" in line 37, we intended it to refer to the growth of the sarcomeres, not their initial formation. We do not speculate about sarcomere assembly anywhere in the text. In fact, we have clearly stated multiple times that our focus is on the growth of the IFM myofilament array during myofibrillogenesis. Nevertheless, to avoid confusion, we revised the phrase in line 37 to "sarcomere growth".

      The authors review the flight muscle sarcomere length literature and conclude it is variable because of imprecise measurements. Likely this is partially true, however, more importantly is that the sarcomere length and width changes during isolation methods of the myofibrils, as well as by various embedding methods, as the authors show here as well in Figure 1B-E.

      We dedicated two sections of the Results - “An automated method to accurately measure sarcomeric parameters” and “IFM sarcomere morphometrics are affected by sex, age, fiber type, and sample preparation” - to exploring potential sources of variability in published IFM sarcomere measurements. Based on these analyses, we conclude that such variability stems from both measurement imprecision and biological or technical factors, including sex, age, fiber type and, of foremost, sample preparation. Because it is difficult to quantify the relative impact of each variable across published studies, we have refrained from speculations about the relative contribution of the different factors in the revised manuscript.

      Hence, I find the strongly claims the authors make here surprising, while they are isolating the myofibrils. Hence, these myofibrils are ruptured at the ends, relaxed or contracted, depending on buffer choice and passive tension is released. On page 8, the authors correctly state that the embedding medium causes shrinkage of the myofibrils. While isolation is state of the art for electron microscopy techniques, other methods including sectioning or even whole mount preparation have been developed for high resolution microscopy of IFMs that avoid these artifacts. Unfortunately, this manuscript only uses isolated myofibrils that were fixed and then mechanically dissociated by pipetting. This method likely induces variations as seen by the large spread of sarcomere length reported in Figure 1C (2.8-3.9µm?) and even bigger spreads for myofibril widths. Are these also seen in tissue without dissections? Unfortunately, no comparision to intact flight muscles are reported with the here presented quantification tool. The sarcomere length spread in the developmental samples is even larger.

      The major issue raised in this paragraph is the use of isolated myofibril versus intact flight muscle preparations. The reviewer claims that the latter might be superior because the isolated myofibrils are ruptured at their ends. Clearly, the intact IFMs cannot be imaged in vivo by light microscopy because the adult fly cuticle is opaque. To visualize these muscles, one must open the thorax, but neither microdissection nor sectioning preserves them perfectly, even the cleanest longitudinal cuts sever some myofibrils, and dissection itself can damage the tissue. Although published images often show only the most pristine regions, the practice of selective cropping cannot be taken as a scientific argument. Here, by comparing sarcomere lengths measured in isolated myofibrils with those from whole-mount longitudinal DLM sections and microdissected IFM myofibers, we demonstrate that isolation does not alter sarcomere length (Figure 1E, now Figure 2A in the revised manuscript). As to myofibril width, it is determined by two parameters: the number of myofilaments and the spacing between them. In vivo filament spacing has been measured directly, and filament counts can be obtained from EM cross-sections of DLM fibers. Combining these values gives an expected in vivo myofibril diameter. While isolated myofibrils measure thinner than those in whole-mount or microdissected samples (Figure 1E, now Figure 2A in the revised manuscript), their diameter closely matches this in vivo estimate (see manuscript, lines 187–198). Therefore, we conclude that isolated myofibrils (even if it seems counterintuitive for this reviewer) are superior for sarcomere measurements than whole-mount preparations - and that is why we primarily rely on them here.

      Despite that, we certainly recognize that isolated myofibrils cannot recapitulate every aspect of an IFM fiber, and the need for whole-mount preparations during our IFM studies is not questioned by us.

              In addition to this general answer to the issues raised in the above paragraph of the reviewer, we would like to specifically reflect for some of the remarks:
      

      „Unfortunately, this manuscript only uses isolated myofibrils that were fixed and then mechanically dissociated by pipetting.”

      This is a false statement that “this manuscript only uses isolated myofibrils” as we used different preparation methods for initial comparisons (see Figure 1E, now Figure 2A in the revised manuscript). Additionally, unlike the reviewer assumed, the myofibrils were first dissociated and then fixed, and not vice versa (as described in the Materials and Methods section).

      „This method likely induces variations as seen by the large spread of sarcomere length reported in Figure 1C (2.8-3.9µm?) and even bigger spreads for myofibril widths. Are these also seen in tissue without dissections?”

      This remark makes absolutely no sense, as we do not report sarcomere length values in Figure 1C at all. By assuming that the reviewer meant to refer to Figure 1B, it still remains a misunderstanding or a false statement, because that panel refers to the variations found in published data (not in our current data), and this is clearly explained both in the figure legend and the main text. Regardless of that, the stated spread does not appear unusual. In the article by Spletter et al. (2018), the authors report a similar spread (2.576–3.542 µm) for sarcomere length in mature IFM using whole-mount DLM cross-sections. As to the second question here, we do observe a comparable spread in other preparations as well (see Figure 1E, now Figure 2A in the revised manuscript), which is again the opposite conclusion as compared to the (clearly false) assumption of the reviewer.

      „Unfortunately, no comparision to intact flight muscles are reported with the here presented quantification tool. „

      This is also a false statement; as we do report comparison to whole mount cross sections which we belive the reviewer considers „intact” in Figure 1E (Figure 2A in the revised manuscript).

      „The sarcomere length spread in the developmental samples is even larger.”

      The spread is not larger at all than in previous reports, as clearly shown in Supplementary Figure 3A.

      The authors suggest that there are sex differences in sarcomere length and pupal development duration. This is potentially interesting, unfortunately they then use mixed sex samples to analyse sarcomeres during flight muscle development.

      In the revised manuscript, we now provide a more detailed description of a subtle post-eclosion difference in IFM sarcomere metrics between male and female Drosophila. We attribute this variation to the well-established observation that female pupae develop slightly faster than males, a property that may last till shortly after eclosion. Confirming this experimentally would require considerable effort with limited scientific benefit. Nonetheless, the subtle nature of this sex-linked variation reinforced our decision to include IFM sarcomeres from both male and female flies in our comprehensive developmental analysis.

      The IMA software tool lacks critical assessment of its performance compared to other tools and the validation presented is too limited. IMA seems to generate systematic errors, based on Fig S1E, as it does not report the ground truth. These have to be discussed and compared to available tools. The principles of fitting used in IMA seem well adapted to IFM myofibrils in low noise conditions, but may not be usable in other situations. This should be assessed and discussed.

      IMA is a specialized software tool developed to address a specific need, notably, to accurately and efficiently measure sarcomere length and myofibril diameter in individual IFM myofibril images labeled with both phalloidin and Z-disc markers. For our purposes, it remains the most suitable and reliable option, and we are confident that IMA outperforms all other available tools. To demonstrate this, we have included a table comparing the few alternatives (MyofibrilJ, SarcGraph, and sarcApp) capable of both measurements, which further supports our conclusion. Given IMA's focused application, extensive validation under artificially low signal-to-noise conditions is unnecessary. While IMA may introduce minor systematic errors (~0.01 µm for sarcomere length and ~0.03 µm for myofibril diameter), these are negligible errors relative to the limitations of the simulated ground truth data used for benchmarking. This point is now addressed in the manuscript.

      It is claimed that validation was achieved on simulated IFM images: do the authors rather mean simulated isolated IFM myofibril images? This is not quite the same in terms of algorithm complexity and this should be corrected if this is the case.

      Indeed, we used simulated individual IFM myofibril images, where both phalloidin labeling and Z-disc labeling are present. This is clearly shown in Supplementary Figure 1A, and stated in the text when first introduced: „we generated artificial images of IFM myofibrils with known dimensions, simulating the image formation process”

      The authors need to revise their comparison to other tools. It is incomplete and seemingly incorrect. It should be clearly stated that IMA is limited to isolated myofibrils, which is a far easier segmentation task than what other tools can do, such as sarcApp (Neininger-Castro et al. 2023, PMID: 37921850). Defining the acronym would be valuable in that sense. The claim line 129-130 "none can adequately measure myofibril diameter from regular side view images" is unclear. What do the authors refer to as "side view images"? Sarc-Graph from Zhao et al 2021, PMID: 34613960, and sarcApp from Neininger-Castro et al. 2023 provide sarcomere width, in conditions that are very similar to what IMA does, e.g. on xy images based on the documentation provided on github. A performance comparison with these tools would be valuable. Does installation and use of IMA require computational skills?

      Motivated by the reviewer’s comments, we revised the section introducing IMA. However, we chose not to include an extensive comparison with other software tools, as this would divert the manuscript’s focus without impacting the main conclusions. Instead, we added a summary table highlighting the key requirements for analyzing IFM sarcomere morphometrics from Z-stacks of phalloidin- and Z-line-labeled individual myofibrils and compared the available tools accordingly. In our experience, most software tools are developed to address very specific problems, even those marketed as general-purpose solutions. Consequently, applying them beyond their intended scope often results in reduced efficiency and suboptimal performance. Although sarcApp was initially available as a free tool, one of its dependencies (PySimpleGUI 5) has since adopted a commercial license model. Using a trial version of PySimpleGUI 5, we evaluated sarcApp on our dataset. The software is limited to single-plane image input, hence raw image stacks must be preprocessed into a suitable format, which is a time consuming step. Furthermore, implementation requires basic programming proficiency, as parameter adjustments must be performed directly within the source code to accommodate dataset-specific configurations. Once appropriately configured, sarcApp reliably quantifies both sarcomere length and myofibril width with accuracy comparable to that of IMA. However, it lacks built-in diagnostic feedback or visualization tools to facilitate measurement verification or troubleshooting during batch processing. SarcGraph also supports only single-plane image inputs and requires prior image preprocessing. Additionally, images must be loaded manually one by one, which further reduces processing efficiency. Parameter optimization relies on direct code modification through a trial-and-error process, demanding a certain level of programming proficiency. Even with these adjustments, the software frequently introduces artifacts - such as Z-line splitting - when applied to our dataset. Even when segmentation is successful, sarcomere length is often overestimated, whereas myofibril diameter is consistently underestimated. As compared to these issues, IMA was designed for ease of use and does not require any programming experience to install or operate. It can automatically handle raw microscopic image formats without the need for preprocessing. Segmentation is fully automated, with no requirement for parameter tuning. The tool provides visual feedback during both the segmentation and fitting steps, allowing users to confidently assess and validate the results. IMA produces accurate and precise measurements of sarcomere length and diameter. Batch processing is enabled by default, significantly improving efficiency when analyzing multiple images. Finally, unlike the reviewer stated, IMA is not limited to isolated myofibrils. It is optimized for isolated myofibrils (i.e. full performance is achieved on these samples), but it can also work on whole-mount preparations in semi-automatic and manual mode, which still allow precise measurements (with some reduction in processing efficiency).

      As to the minor comments, the acronym IMA was already defined in lines 541 and 917–918 of the original submission, as well as on the software’s GitHub page. Additionally, we replaced the phrase "side view images" with "longitudinal myofibril projections" to improve clarity.

      How do the authors know that the bright phallodin signal visible that the Z-disc at 36h and 48h APF is due to actin filament overlap, as suggested? An alternative solution are more short actin filaments at the early Z-discs.

      It is widely accepted that the bright phalloidin signal at the Z-line in mature sarcomeres reflects actin filament overlap (e.g., Littlefield and Fowler, 2002; PMID: 11964243). Accordingly, in slightly stretched myofibrils, this bright signal diminishes, and in more significantly stretched myofibrils, a small gap appears (e.g., Kulke et al., 2001; PMID: 11535621). The width of this bright phalloidin signal corresponds to the electron-dense band seen in longitudinal EM sections (Figure 3B and Supplementary Figure 5B, now Figure 4B and Figure 6B in the revised manuscript) and matches the actin filament overlap observed in Z-disc cryo-EM reconstructions from other species (Yeganeh et al., 2023; Rusu et al., 2017), where individual thin filaments can be resolved. By extension, we interpret the bright phalloidin signals at the Z-discs observed at 36 h and 48 h APF as arising from similar actin filament overlaps, given their comparable width to the electron-dense Z-bodies described both in our study (Supplemantary Figure 5B, now Figure 6B in the revised manuscript) and by Reedy and Beall (1993). While we cannot fully rule out the reviewer’s alternative interpretation, for the time being it remains a bold speculation without supporting evidence, and therefore we prefer to stay with the conventional view.

      The authors seem to doubt their own interpretation that actin filaments shrink when reading line 304 and following. This is obviously critical for the "model" presented.

      Unlike the reviewer implies, we certainly do not doubt our own interpretation, but to avoid confusion we revised the corresponding paragraph in the manuscript and provided more details on our explanation, and we also provide a brief overview of it here. Between 36 h and 48 h APF we observe a pronounced structural transition in the IFM sarcomeres. In EM cross-sections, the previously irregular myofilament lattice becomes organized into a regular hexagonal pattern (Figure 3A, now Figure 4A in the revised manuscript) with filament spacing typical of mature myofibrils (Supplementary Figure 5A, now Figure 6A in the revised manuscript). In longitudinal EM sections, the elongated, amorphous Z-bodies condense along the myofibril axis to form well-defined, adult-like Z-discs (Supplementary Figure 5B, now Figure 6B in the revised manuscript). Similarly, dSTORM imaging shows that the Z-disc associated D-Titin epitopes become more compact and organized during this period (Supplementary Figure 4E, now Figure 5E in the revised manuscript). The edges of the thick filament arrays also become more sharply defined, and the appearance of a distinct bare zone indicates the establishment of a regular register (Figure 3B, now Figure 4B in the revised manuscript). By assuming that a similar reorganization occurs within the thin filament array, the apparent length of the thin filament array would decrease—not due to shortening of individual filaments, rather due to improved alignment. Although we cannot directly resolve single thin filaments, this reorganization offers the most plausible explanation for the observed change.

      Minor comments: 1. Figure S1B is not called out in the text.

      The reviewer might have missed this, but in fact, it is explicitly called out in line 181.

      Fig. 1: Please state whenever images are simulations?

      We appreciate the reviewer’s observation that the simulated IFM myofibril images are indistinguishable from the real ones, as this confirms the adequacy of these images for testing our software tool. However, this is already clearly indicated: Figure 1B features simulated images, as noted in the figure legend (line 824), and Supplementary Figure 1A similarly shows simulated images, as stated both in the legend (line 886) and in the figure.

      Fig. 2: Length-width correlation - please provide individual points color-coded by time point?

      As suggested by the reviewer, we generated a plot with the individual points color-coded by time.

      "newly eclosed males and females, we observed that males have slightly shorter sarcomeres and narrower myofibrils". Please provide a statistical test supporting the difference.

      In the revised manuscript, we compared sarcomere length and myofibril width between males and females from 0 to 96 hours AE using a two-way ANOVA with Sidak’s multiple comparisons test. We expanded our description of these observations in the main text, and details of the statistical analysis are now included in the revised figure legend (Figure 1E). Briefly, newly eclosed males showed slightly shorter sarcomeres than females - a consistent but non-significant trend (p = 0.9846) - which resolved by 12 h AE, with sarcomere lengths remaining similar thereafter (p = 0.1533; Figure 1E). In contrast, myofibril width was significantly narrower in the newly eclosed males (p = 0.0374), but this difference disappeared between 24 and 48 h AE as myofibrils expanded in diameter during post-eclosion development (p

      Were statistical tests performed using animals as sample numbers? Please clarify in the images what are animal and what are sarcomere numbers.

      Following standard guidelines, statistical tests were performed using the means of independent experiments, as noted in the figure legends. For each experiment, we used approximately 6 animals, and this information is now included in the Materials and Methods section.

      mef2-Gal4 should be spelled Mef2-GAL4 according to Flybase.

      This has been corrected in the revised text and figures.

      Are the images shown in Figure 2B representative? 96h AE appears thicker than 24h AE but the graph reports no difference.

      We aimed to show representative images, however, in the case of 96h APF we may have selected a wrong example. We now changed the image for a more appropriate one.

      The authors only found Zasp52 and alpha-Actinin at the Z-discs from 72h APF onwards, which is different to what others have reported.

      Similarly to former reports, we detected both α-Actinin (see Supplementary Figure 3B, now Figure 3C in the revised manuscript) and Zasp52 in microdissected IFMs as early as 36 hours APF. However, these markers were largely absent in isolated myofibrils from the early pupal stages (36–60 hours APF). By 60 hours APF, strong α-Actinin and Zasp52 signals were clearly visible in isolated myofibrils (the closest timepoint captured by dSTORM is 72h APF). As discussed in the manuscript, a likely explanation is that α-Actinin and Zasp52 are recruited to developing Z-bodies early on but are only fully incorporated into mature Z-discs between 48 and 60 hours APF. Their incomplete integration at earlier stages may lead to their loss during the isolation procedure.

      Thick filament length during development has also been estimated by Orfanos and Sparrow, which should be cited (PMID: 23178940)

      Contrary to the reviewer’s claim, the article 'Myosin isoform switching during assembly of the Drosophila flight muscle thick filament lattice' does not provide any measurements or estimates of thick filament length; it only includes a schematic illustration where the length of the thick filaments is not based on empirical data.

      **Referee Cross-commenting**

      I also agree with my colleagues comments, which are largely consistent.

      Reviewer #3 (Significance (Required)):

      This paper introduces a tool to measure sarcomere length. Easy to use tools that do this as well already exist. The tool can also measure sarcomere width, which it claims as unique point, which is not the case, see above comment.

      We are aware that other tools exist to measure sarcomere parameters (and we did not claim the opposite in our ms), nevertheless, we need to emphasize that based on our comparisons, IMA is superior to all three alternatives. Three software tools could, in principle, be used to measure both sarcomere length and myofibril diameter: MyofibrilJ, SarcGraph, and sarcApp. However, two of them - MyofibrilJ and SarcGraph - consistently under- or overestimate these values. The only tool capable of performing these measurements reliably, sarcApp, is no longer freely available, it requires programming expertise, and it does not support raw image file formats, making it difficult to use in practice (see above comments for more details). In contrast, IMA is user-friendly and does not require any programming expertise to install or operate. It can automatically process raw microscopic image formats without the need for preprocessing. Segmentation is fully automated, and no parameter tuning is necessary. The tool offers visual feedback on both the segmentation and fitting processes, enabling users to validate results with confidence. IMA delivers accurate and precise measurements of sarcomere length and diameter. Additionally, batch processing is enabled by default, significantly enhancing workflow efficiency.

      This manuscript shows that depending on the isolation and embedding media sarcomere and myofibrils width changes and hence artifacts can be introduced. While this is not suprising, it has not been well controlled in a number of previous publications.

      Furthermore, this paper measures sarcomere length and width during flight muscle development and consolidates what was already known from previous publications. Sarcomeres are added until 48 h APF, then they grow in diameter. Despite strong claims in the text, I do not see any significant novel findings how sarcomeres grow in length or width or any significant deviations from what has been published before. This is even documented in the supplementary graphs by comparing to published data. It is close to identical.

      The overall process has been quantitatively described in four previous studies (Reedy and Beall, 1993, Orfanos et al., 2015, Spletter et al., 2018, Nikonova et al., 2024). While there is general agreement on the pattern of sarcomere development, significant discrepancies exist among these datasets; differences that become particularly problematic when attempting to build structural models. More specifically: Reedy and Beall (1993) report substantially shorter sarcomeres compared to all other datasets, including ours. This discrepancy likely stems from two factors: (i) their use of longitudinal EM sections, where sample preparation is known to cause considerable tissue shrinkage; and (ii) the maintenance of their flies at 23 °C, a temperature that clearly delays development relative to the more commonly used 25 °C. Interestingly, Spletter et al. (2018) and Nikonova et al. (2024) conducted their experiments at 27 °C, which also deviates from standard conditions and may complicate comparisons. Orfanos et al. (2015) suggested that mature sarcomere length is reached by approximately 88 hours after puparium formation (APF). In contrast, our measurements show that sarcomeres continue to elongate beyond this point, reaching mature length between 12 and 24 hours post-eclosion. All four earlier studies report a mature sarcomere length around 3.2-3.3 µm, only slightly longer than the ~3.2 µm length of thick filaments (Katzemich et al., 2012; Gasek et al., 2016). This would imply an I-band length below ~100 nm, which is an implausibly short distance. In contrast, our data, along with several recent studies (González-Morales et al., 2019; Deng et al., 2021; Dhanyasi et al., 2020; DeAguero et al., 2019), support a mature sarcomere length of approximately 3.45 µm, placing the length of the I-band at around 250 nm. This estimate is more consistent with high-resolution structural observations from longitudinal EM sections and fluorescent nanoscopy (Szikora et al., 2020; Schueder et al., 2023). Although Reedy and Beall (1993) provide limited data on myofibril diameter during myofibrillogenesis, a more detailed quantitative analysis is presented by Spletter et al. (2018) and by Nikonova et al. (2024). Interestingly, Spletter et al. report two separate datasets - one based on longitudinal sections and another on cross-sections of DLM fibers. While the measurements are consistent during early pupal stages, they diverge significantly in mature IFMs (1.116 ± 0.1025 µm vs. 1.428 ± 0.0995 µm), a discrepancy that is not addressed in their publication. Nikonova et al. (2024) report even narrower myofibril widths (0.9887 ± 0.1273 µm). Moreover, the reported diameters of early myofibrils in all three datasets are nearly twice as large as those reported by Reedy and Beall (1993) and in our own measurements, directly contradicting the reviewer's claim that the values are “close to identical.” Finally, our data clearly demonstrate that both the length and diameter of IFM sarcomeres reach a plateau in young adults, which is a key developmental feature not examined in previous studies.

      In summary, we did not and we do not intend to claim that our conclusions are novel as to the general mechanisms of myofibril and sarcomere growth. Rather, our contribution lies in providing a high-precision, robust analysis of the growth process using a state-of-the-art toolkit, resulting in a comprehensive description that aligns with structural data obtained from TEM and dSTORM. We therefore believe that expert readers will recognize numerous valuable aspects of our approaches that will advance research in the field.

      Counting the total number of thick filaments during myofibril development is nice, however, this also has been done (REEDY, M. C. & BEALL, C. 1993, PMID: 8253277). In this old study, the authors reported the amount of filament across one myofibril. How does this compare to the new data here counting all filaments? Unfortunatley, this is not discussed.

      Indeed, the study by Reedy and Beall (1993) was primarily based on longitudinal DLM sections, which were used to estimate myofibril width and count the number of thick filaments on this lateral view images (e.g., ~15 thick filaments wide at 75 hours APF), but total thick filament numbers were not provided. While such data could theoretically be used to estimate the number of myofilaments per myofibril, these estimations would depend on the unverified assumption that the section includes the full width of the myofibril. Additionally, the study did not provide standard deviations or the number of measurements, limiting the interpretability and reproducibility of their findings. These points highlight the need for a more rigorous and quantitative approach. For these reasons, we chose to quantify myofilament number using cross-sections, providing more accurate and reliable assessments.

      Besides the difference between the lateral versus cross sections, a direct comparison of our studies is further complicated by differences in the developmental time points and experimental conditions used. Reedy and Beall (1993) reports data from pupae aged 42, 60, 75 and 100 hours, as well as from adults, whereas we present data from 36, 48, and 72 hours APF, and from 24 hours after eclosion, which corresponds to approximately 124 hours APF. Moreover, their experiments were carried out at 23 °C, a temperature that somewhat slows down pupal development and results in adult eclosion at around 112 hours APF, as stated in their study. In contrast, our experiments were carried out at the more commonly used 25 °C, where adults typically emerge around 100 hours APF.

      Collectively, these differences prevented meaningful comparisons between the two datasets, and therefore we preferred to avoid lengthy discussions on this issue.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02953

      Corresponding author(s): Andreas, Villunger

      1. General Statements [optional]

      *We would like to thank the reviewers for their constructive input and overall support. We appreciate to provide a provisional revision plan, as outlined here, and are happy to engage in additional communication with journal editors via video call, in case further clarifications are needed. *

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer #1

      __Evidence, reproducibility and clarity __

      Summary: This manuscript by Leone et al describes the role of the PIDDosome in cardiomyocytes. Using a series of whole body and cardiomyocyte specific knockouts, the authors show that the PIDDosome maintains correct ploidy in these cells. It achieves this through inducing cell cycle arrest in cardiomyocytes in a p53 dependent manner. Despite this effect on ploidy, PIDDosome-deficient hearts show no structural or functional defects. Statistics and rigor appear to be adequate.

      We thank this referee for taking the time to evaluate our work and their valuable comments. We assume that this reviewer by mistake indicates that the phenomenon we describe, depends on p53. As outlined in the abstract and throughout the manuscript, the effect is independent of p53, but may additionally still involve p21, acting along or parallel to the PIDDosome.

      Major comments: 1. Figure 1 uses fluorescent intensity of a nuclear stain to determine ploidy per nucleus and they further separate the results into mononucleated, binucleated or multinucleated cells. It is hard to know how to interpret these results without further information or controls. Is there a good positive control that can be used to help to show whether this assay is quantitative? The differences are larger with the Raidd and caspase-2 knockouts than with the Pidd knockouts but this is not addressed.

      *We appreciate this concern. Regarding a "good positive control" we can say that we follow state-of the art in the cardiomyocyte field and studies by the Evans (PMID: 36622904), Kuhn (PMID: 32109383), Bergmann (PMID: 26544945) and Patterson labs (PMID: 28783163, 36912240) all use the identical approach to discriminate 2n from 4n nuclei in microscopy images at the cellular level. The fact that the majority of rodent CM nuclei is indeed diploid (PMID: 31175264, 31585517 and 32078450) and a large number of nuclei has been evaluated to assess their mean fluorescence intensity (MFI) reduces the risk of a systematic bias in our analysis. Moreover, we have used an orthogonal approach that is indeed quantitative to define DNA content, i.e,. flow-cytometry based evaluation of DNA content in isolated CM nuclei (Fig. 1C). We hence are confident our assays are quantitative. *

      Regarding the fact that loss of Pidd1 causes a more saddle phenotype, we can offer to discuss this in light of the fact that Pidd1 has additional functions, outside the PIDDosome (PMID: 35343572), and that we made similar observations when analyzing ploidy in hepatocytes (PMID: *31983631). Given the fact that all components of the PIDDosome show a similar phenotype, and that this phenotype is mimicked by loss of the protein that connects PIDD1 and centrosomes, ANKRD26 (Fig. 4a), we are confident that this biological variation in our analysis is not affecting our conclusions. *

      On line 459 the authors state that the increase in polyploidy in PIDDosome knockouts occurs in adult hood but this is not directly tested. In fact, in the next section the polyploidy is assessed in early postnatal development. This statement should be explained or removed.

      We see that we have made an unclear statement here. In fact, we first noted increases in ploidy in adult heart and then define the time window in development when this happens. This sentence will be rephrased.

      In Figure 4. The authors obtained RNAseq data for P1, P7 and P14 but only show the differences with and without caspase-2 at P7. Given that the differences in ploidy are more significant at P14 (Fig 3D), all the comparisons should be shown along with analysis of whether the same genes/gene families are altered in the absence of caspase-2.

      The reason why we focus on postnatal day 7 (P7) is that data from Alkass et al (PMID: 26544945) and other labs (PMID: 31175264 ) document that on this day the initial wave of binucleation peaks. Hence, we hypothesized that the PIDDosome must be active in most CM, which aligns well with the increased mRNA levels of all of its components (Figure 3). Interestingly, it seems that its action is tightly regulated, as mRNA of PIDDosome components drop on P10, suggesting PIDDosome shut-down or downregulation. Similar findings have been noted in the liver (PMID: *31983631). Alkass and colleagues also show that very few CMs enter another round of DNA synthesis between P7 and P14, and hence possible transcriptome changes in the absence of the PIDDosome will be strongly diluted. *

      Please note that on P1, there is no difference between genotypes to be expected as all CM are mononucleated diploids and cytokinesis competent, as previously demonstrated (PMID: *26544945). Moreover, PIDDosome expression levels are extremely low (Fig. 3A). As such, no difference between genotypes are expected on P1. In addition, on P14 the ploidy phenotype observed in PIDDosome knockout mice reaches the maximum and ploidy increases are comparable to adult tissue. Thus, at this time the trigger for PIDDosome activation (cytokinesis failure) is no longer observed as the majority of CMs are post-mitotic, (PMID: 26247711). As such the impact of PIDDosome activation on the P14 transcriptome is most likely negligible. However, if desired, we can expand our bioinformatics analysis summarizing findings made related to DEGs over time in wt animals by comparing genotypes also on day 1 and day 14. In light of the above, analysis between genotypes on P7 holds still appears as the one most meaningful. *

      Some validation of the RNAseq and/or proteomics results would be an important addition to this study

      We agree with this notion and propose to validate key candidates related to cardiomyocyte proliferation and polyploidization, some of which we found to be differentially expressed at the mRNA level on day 7in the RNAseq data (e.g., p21, Foxm1, Kif18a, Lin37 and others)

      Regarding the proteomics results, we face the challenge that we can only try to confirm if candidate proteins are likely caspase substrates in silico using DeepCleave*, and potentially pick one or two candidates linked to CM differentiation for further analysis in vitro and in heterologous cell based assays (e.g. 293T cells), as no bona-fide ventricular cardiomyocyte cell lines exist. Primary postnatal CMs are extremely difficult to transfect, nor they proliferate without drug-treatment, or fail cytokinesis ex vivo. *

      Figure 4D: the authors make the conclusion that p21 is downstream of PIDD (et p53 independent). However, this is not supported by the data because the increase in 4N cells/decrease in 2N cells, although statistically significant, is nowhere near that of caspase-2 KO and caspase-2/p21 KO. Statistics should also compare p32KO with c2KO. In the absence of any other data, the more likely conclusion is that p21 is not involved.

      *We agree that the findings related to the impact seen upon loss of p21 suggest that it is not the only effector involved in ploidy control and it may not even be an effector engaged by caspase-2, as C2/p21 DKO mice have an even higher ploidy increase, albeit not statistically significant. However, it is important to highlight that p21 (Cdkn1a) was found to be downregulated in our transcriptomic analysis suggesting an involvement in the caspace2-cascade. We are happy to highlight this when presenting the results and in the discussion. *

      *We assume that this referee refers to p73 KO data that should be compared to Casp2 KO data (could be read as p73 or p53, but the latter we compare side by side with Casp2 in Fig. 4 already). As p73 KO mice were not found to be viable beyond day 7 (our attempt to find animals on day 10 failed, in line with published literature (PMID: 24500610, 10716451)), we can only offer to compare this data set to the data presented in Figure 3C, where we have analyzed ploidy increases on day 7 from wt and PIDDosome mutant mice. This re-analysis will show that only Caspase-2 mutant mice display a significant ploidy increase on P7, when compared to wt or p73 mutant animals, while no difference are noted between wt and p73 mutant mice (to be included in new Suppl. Fig. 3C) *

      Minor comments: Suggest moving Figure 4A to Figure 3 as it seems to fit better there based on the citation of this figure in the text

      *We can see some benefit in this recommendation and included panel 4A now in an updated version of Figure 3. *

      Recommend enhancing the brightness of microscopy images in Figure 1E and 2D

      We will try to improve image quality, may have been due to PDF conversion

      Significance

      This study provides interesting information for the role of the PIDDosome in protecting from polyploidy and adds to the body of work by this same group studying this pathway in the liver.

      The main weakness in terms of significance is the lack of a phenotype in the hearts of these animals. Therefore, it is clear that ploidy (or at least PIDDosome dependent ploidy) has minimal impact on cardiac development.

      We respectfully disagree with the comment that the lack of impact on cardiac function constitutes a weakness of our findings. Several studies on ploidy control in the liver (PMID 34228992) but importantly also heart (PMID: 36622904) have failed to document a clear impact of increased ploidy on organ function. This does not infer insignificance, but maybe rather that the context where this becomes relevant has not been identified. We are happy expand on this in our discussion

      The authors mention that they have not tried giving these mice an myocardial infarct (MI) or inducing any other type of cardiac damage. Although it is understood that these experiments are likely outside of the scope of the present study, without this information the impact of this study is moderate. I recommend expanding the discussion to provide a more in-depth possible rationale as to why ploidy perturbations do not lead to structural changes like in the liver.

      Despite this, the insights to the pathway itself are interesting to investigators in the caspase-2 field if a little underdeveloped, especially concerning the role of p21.

      My expertise is in cell death and caspase biology (especially caspase-2). I have sufficient expertise to evaluate all parts of this paper.

      *As mentioned above, we will amend our conclusions on p21, in light of potential findings made when validating DEG candidates, as stated above. *

      *We hope that the changes and amendments proposed here will be satisfactory to this referee to recommend publication of a revised manuscript. *

      Reviewer #2

      __Evidence, reproducibility and clarity: __

      __Summary: __

      In this study, the authors investigated the role of the PIDDosome during cardiomyocyte polyploidization. PIDDosome is a multi-protein complex activating the endopeptidase Caspase-2, and shown to be involved in eliminating cells with extra centrosomes or in response to genotoxic stress (Burigotto & Fava, 2021, Sladky and Villunger, 2020). In both cases, the PIDDosome is recruited in a ANKRD26-dependent manner at the centrosomes leading to p53 stabilization and cell death (Burigotto & Fava, 2021; Evans et al., 2020; Burigotto et al., 2021).

      Here, by studying mouse cardiomyocyte differentiation, the authors showed that PIDDosome is imposing ploidy restriction during cardiomyocyte differentiation. Importantly, in contrast to a previous report in the liver (Sladky et al., 2020), they showed that PIDDosome acts in a p53-independent manner in cardiomyocytes. Indeed, they suggested that PIDDosome controls ploidy in cardiomyocytes through p21 activation.

      We want to thank this reviewer for the time taken to evaluate our work and provide critical feedback that will help to improve our revised manuscript.

      __Major comments: __

      In general the conclusions of the authors are well supported by the experiments. However, I would suggest the following experiments/analysis to strengthen the paper:

      The authors should improve the Figure 1 to help the readers who are not familiar with cardiomyocyte polyploidization. For instance, I would suggest to add a scheme to summarize cardiomyocyte polyploidization (in terms of nuclear size, mono vs multi and so on).

      We agree that a visual summary of the postnatal timing of CM polyploidization will be helpful for the generalist not familiar with the topic and have added a scheme, adapted from a study by Alkass et al. (PMID: *26544945), who elegantly defined the timing of this process during postnatal mice life (now Fig. 1A). *

      Based on the images they presented in 1B, the authors should also measure the nuclear area or volume in the different conditions in which components of the PIDDosome were depleted. Indeed, these two parameters should be easier to conceptualize for the readers (instead of the fluorescence nuclear intensity). This could help to understand if the nuclear size is maintained between the different conditions and if this is comparable between mono, bi or multinucleated cardiomyocytes.

      We have acquired this data and it can be used to provide additional information on nuclear area and/or volume. We propose to focus on re-analyzing data from wt, Casp2 and XMLC2CRE/Casp2f/f mice. The additional information can be included in Figures 1 & 2, respectively.

      • In Figure 2A, the authors presented cross section of heart from animals showing that PIDDosome depletion has no effect on heart size. This is a surprising result since cardiomyocytes have higher ploidy levels and this could have an effect on their function. Since the importance of this observation, the authors should present a quantification of the heart size in the different conditions shown in Figure 2A.

      We agree with this comment. We can measure the heart vs. body weight ratio or tibia length in adult Casp2-/- vs. WT (3 month old) in order to indirectly evaluate possible increases in CM size linked to increased ploidy.

      Also, the percentage of cardiomyocytes presenting higher levels of ploidy seems quite low. The authors should discuss this point. In particular because this could explain the absence of consequences on heart size and function at steady state.

      We agree with this conclusion and will expand on this in our discussion. It is important to note that as opposed to findings made in liver (PMID: *31983631), genetic manipulation of ploidy regulators such as E2f7/8 (PMID: 36622904), only led to modest changes in CM ploidy, suggesting that either a small band-width compatible with normal heart function exists, or that additional mechanisms exist that take control when these thresholds set by the PIDDosome or E2f7/8 are exceeded. These mechanisms could involve Cyclin G (PMID: 20360255), or TNNI3K (PMID: 31589606). Importantly, a recent publication has shown that overexpression of Plk1(T210D) and Ect2 from birth causes increased heart weight coupled with a minor decrease in CM size. These mice undergo to premature death (PMID: 39912233) suggesting that CM polyploidization is a tight regulated process regulated by several independent mechanisms during heart development. *

      In Figure 2D, the authors measured the cardiomyocyte cross-sectional area and concluded that removing PIDDosome components have no effect on cardiomyocyte cell size. Since it has been shown that ploidy increase is normally associated with an increase in cell area, the authors should measure cell area of cardiomyocytes analyzed in Figure 1B. It could be then interesting to establish a correlation with nuclear area and the mono, bi or multinucleated status. This will strengthen the results showing that ploidy increases without affecting cell area.

      Indeed, studies in PIDDosome deficient livers suggest that tissue is containing fewer but bigger cells (PMID: *31983631). As opposed to the liver the percentage of cardiomyocytes presenting higher levels of ploidy is relatively low. Thus, a possible increase in CM size in PIDDosome deficient mice may be masked in heart cross-sections. In order to better correlate the ploidy with cell size, we propose to reanalyze our microscopy images used to extract the data displayed in Fig. 1D. We may run into the problem though that the number of cells acquired may become limiting to achieve sufficient statistical power. In this case we could pool data from different PIDDosome mutant CM to increase statistical power. Again, we propose to initially prioritize wt vs. Casp2 vs. XMLC2/Casp2f/f mice. In addition, we can offer to quantify heart to body weight ratio or tibia length as an additional read-out (see answer to a previous reviewer comment). *

      The authors should discuss the fact that PIDDosome depletion lead only to a mild increase in ploidy levels (4N) in a small percentage of cardiomyocyte. If the PIDDosome is controlling ploidy, one could expect that removing it should lead to a drastic increase in the ploidy levels. Is PIDDosome depletion leading to cell death in some cardiomyocyte? The authors should discuss this point in the discussion or if relevant show a staining with an apoptosis marker. Is another mechanism compensating to prevent higher ploidy levels in cardiomyocytes?

      These are valid thoughts, some of which we contemplated before. In part, we have addressed them in our response to Reviewer#1, above, discussing similar findings made in E2f7/8 deficient hearts (PMID: 36622904), or Cyclin G overexpressing hearts (PMID: 20360255), where also only modest changes in ploidy were achieved. Together these observations are suggesting alternative control mechanism able to act, or limited tolerance towards larger shifts in ploidy, incompatible with proper cell function and survival. Towards this end, we can offer to test if we find increased signs of cell death in PIDDosome mutant hearts by TUNEL staining of histological sections. Of note, we did not find evidence for such a phenomenon in the liver (PMID: 31983631).

      Even if the authors presented RNAseq data suggesting that the PIDDosome is activated during cardiomyocyte differentiation, they should clearly demonstrate this point to strengthen the message of the paper. Indeed, the conclusions are based on the absence of PIDDosome components triggering higher ploidy in cardiomyocytes. However, we don't know whether (and when) the PIDDosome is activated during cardiomyocyte differentiation to control their ploidy levels. I would suggest to analyze PIDDosome activation markers by immunofluorescence in *cardiomyocytes at different developmental stages. *

      *We agree with this referee that direct proof of PIDDosome activation would be helpful and that we only infer back from loss of function phenotypes when and where the PIDDosome becomes activated. However, several technical issues prevent us from collecting more direct evidence of PIDDosome activation in the developing heart. 1) Polyploidization in heart CM appears to happen gradually in CM from day 3 on with a peak at day 7 (PMID: 26544945). Hence, this is not a synchronous process, where we could pinpoint simultaneous activation of the PIDDosome in all cells at the same time, which would facilitate biochemical analysis, e.g., by western blotting for signs of Caspase-2 activation (i.e. the loss of its pro-form, PMID: 28130345). 2) Our most reliable readout, MDM2 cleavage by caspase-2 giving rise to specific fragments detectable in western, is not applicable to mouse tissue, as the antibody we use only detects human MDM2 (PMID: 28130345) and no other MDM2 Ab we tested gave satisfactory results. Independent of that, 3) we do not see involvement of p53 in CM ploidy control (arguing against a role of MDM2). *

      *As such, we can only offer to look at extra centrosome clustering in postnatal binucleated CM (as also suggested further below), as a putative trigger for PIDDosome activation. However, this has been published by the first author of this study before (PMID 31301302). Given that we have made the significant effort to time resolve the increase in ploidy in postnatal mice (please note that several hearts needed to be pooled for each time point, analyzed in multiple biological replicates), we think that our conclusions are well-justified based on the genetic data provided. *

      Concerning the methods, the authors must add the references for each product they used and not only the origin. When relevant, the RRID should be indicated. Without this information the method and the data cannot be reproduced.

      We will update this information where relevant to reproduce our results

      Minor comments:

      In general, the text and the figures are clear. Nevertheless, I would suggest the following changes:

      • Figures 1B, 2B and 2C: the y-axis must start at 0.

      We will adopt axes accordingly

      Figure 4A: The authors should stain centrosomes in cardiomyocytes. This should strengthen the conclusion taken by the authors based on the results obtained in mice depleted for ANKRD26. Indeed, for the moment they are insufficient to conclude about the role of the centrosomes. The authors should show that centrosomes cluster in cardiomyocytes (a condition necessary for PIDDosome activation in polyploid cells) and if possible that component of the PIDDosome are recruited here.

      *This point is well taken and addressed in part above. Clustering of extra centrosomes has been documented and published by the first author of this study in rat polyploid cardiomyocytes (PIMID; cited). We can offer to show clustering of centrosomes in mouse CM isolated from day 7 hearts, but while PIDD1 can be detected well in MEF, we repeatedly failed to stain fro PIDD1 in primary CMs. *

      Figure 4F: I would suggest to modify the working model to emphasize more the differences between WT and PIDDosome KO.

      We will aim to improve this cartoon/graphical abstract

      The prior studies are referenced appropriately.

      Reviewer #2 (Significance (Required)):

      How polyploid cells control their ploidy levels during differentiation remains poorly understood. The data presented here represent thus an advance concerning this question. The actual model concerning PIDDosome activation relies on the presence of extra centrosomes that drives the ANKDR26-dependent recruitment of the PIDDosome. Then, Caspase 2 is activated leading to a p53-p21 dependent cell cycle arrest (Burigotto & Fava, 2021, Sladky and Villunger, 2020; Janssens & Tinel, 2012; Evans et al., 2020; Burigotto et al., 2021). In this study, the authors showed that similar pathway takes place during cardiomyocyte differentiation to control ploidy levels. These data are reminiscent of previous work showing PIDDosome involvement during hepatocyte polyploidization (Sladky et al. 2020). Together, these data highlight the prominent role of the PIDDosome complex in controlling ploidy levels in physiological context. Importantly, this study identified that the classical p53-dependent cell cycle arrest described after PIDDosome activation is not involved here. Instead, the data established that independently of p53, p21 contribute to control cardiomyocyte ploidy. In consequence, this study extends the initial pathway associated with PIDDosome activation and suggest that other mechanisms could take place to restrain cell proliferation upon PIDDosome activation. Overall, this makes this paper significant and of interest for the following fields: polyploidy, heart/cardiomyocyte development and PIDDosome.

      My field of expertise includes polyploidy, cell cycle and genetic instability.

      We thank this reviewer for the time taken and the positive feedback provided.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      N/A

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *As outlined above, limited tools are available to validate putative caspase-2 substrates, identified in proteomics analysis, in an impactful manner. *
      • *Also, as discussed above, we deem myocardial infarction experiments in mice as unsuitable to improve our work, as with all likely-hood, they will yield negative results. *
    2. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02953

      Corresponding author(s): Andreas, Villunger

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      *We would like to thank the reviewers for their constructive input and overall support. We appreciate to provide a provisional revision plan, as outlined here, and are happy to engage in additional communication with journal editors via video call, in case further clarifications are needed. *

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer #1

      __Evidence, reproducibility and clarity __

      Summary: This manuscript by Leone et al describes the role of the PIDDosome in cardiomyocytes. Using a series of whole body and cardiomyocyte specific knockouts, the authors show that the PIDDosome maintains correct ploidy in these cells. It achieves this through inducing cell cycle arrest in cardiomyocytes in a p53 dependent manner. Despite this effect on ploidy, PIDDosome-deficient hearts show no structural or functional defects. Statistics and rigor appear to be adequate.

      We thank this referee for taking the time to evaluate our work and their valuable comments. We assume that this reviewer by mistake indicates that the phenomenon we describe, depends on p53. As outlined in the abstract and throughout the manuscript, the effect is independent of p53, but may additionally still involve p21, acting along or parallel to the PIDDosome.

      Major comments: 1. Figure 1 uses fluorescent intensity of a nuclear stain to determine ploidy per nucleus and they further separate the results into mononucleated, binucleated or multinucleated cells. It is hard to know how to interpret these results without further information or controls. Is there a good positive control that can be used to help to show whether this assay is quantitative? The differences are larger with the Raidd and caspase-2 knockouts than with the Pidd knockouts but this is not addressed.

      *We appreciate this concern. Regarding a “good positive control” we can say that we follow state-of the art in the cardiomyocyte field and studies by the Evans (PMID: 36622904), Kuhn (PMID: 32109383), Bergmann (PMID: 26544945) and Patterson labs (PMID: 28783163, 36912240) all use the identical approach to discriminate 2n from 4n nuclei in microscopy images at the cellular level. The fact that the majority of rodent CM nuclei is indeed diploid (PMID: 31175264, 31585517 and 32078450) and a large number of nuclei has been evaluated to assess their mean fluorescence intensity (MFI) reduces the risk of a systematic bias in our analysis. Moreover, we have used an orthogonal approach that is indeed quantitative to define DNA content, i.e,. flow-cytometry based evaluation of DNA content in isolated CM nuclei (Fig. 1C). We hence are confident our assays are quantitative. *

      Regarding the fact that loss of Pidd1 causes a more saddle phenotype, we can offer to discuss this in light of the fact that Pidd1 has additional functions, outside the PIDDosome (PMID: 35343572), and that we made similar observations when analyzing ploidy in hepatocytes (PMID: *31983631). Given the fact that all components of the PIDDosome show a similar phenotype, and that this phenotype is mimicked by loss of the protein that connects PIDD1 and centrosomes, ANKRD26 (Fig. 4a), we are confident that this biological variation in our analysis is not affecting our conclusions. *

      On line 459 the authors state that the increase in polyploidy in PIDDosome knockouts occurs in adult hood but this is not directly tested. In fact, in the next section the polyploidy is assessed in early postnatal development. This statement should be explained or removed.

      We see that we have made an unclear statement here. In fact, we first noted increases in ploidy in adult heart and then define the time window in development when this happens. This sentence will be rephrased.

      In Figure 4. The authors obtained RNAseq data for P1, P7 and P14 but only show the differences with and without caspase-2 at P7. Given that the differences in ploidy are more significant at P14 (Fig 3D), all the comparisons should be shown along with analysis of whether the same genes/gene families are altered in the absence of caspase-2.

      The reason why we focus on postnatal day 7 (P7) is that data from Alkass et al (PMID: 26544945) and other labs (PMID: 31175264 ) document that on this day the initial wave of binucleation peaks. Hence, we hypothesized that the PIDDosome must be active in most CM, which aligns well with the increased mRNA levels of all of its components (Figure 3). Interestingly, it seems that its action is tightly regulated, as mRNA of PIDDosome components drop on P10, suggesting PIDDosome shut-down or downregulation. Similar findings have been noted in the liver (PMID: *31983631). Alkass and colleagues also show that very few CMs enter another round of DNA synthesis between P7 and P14, and hence possible transcriptome changes in the absence of the PIDDosome will be strongly diluted. *

      Please note that on P1, there is no difference between genotypes to be expected as all CM are mononucleated diploids and cytokinesis competent, as previously demonstrated (PMID: *26544945). Moreover, PIDDosome expression levels are extremely low (Fig. 3A). As such, no difference between genotypes are expected on P1. In addition, on P14 the ploidy phenotype observed in PIDDosome knockout mice reaches the maximum and ploidy increases are comparable to adult tissue. Thus, at this time the trigger for PIDDosome activation (cytokinesis failure) is no longer observed as the majority of CMs are post-mitotic, (PMID: 26247711). As such the impact of PIDDosome activation on the P14 transcriptome is most likely negligible. However, if desired, we can expand our bioinformatics analysis summarizing findings made related to DEGs over time in wt animals by comparing genotypes also on day 1 and day 14. In light of the above, analysis between genotypes on P7 holds still appears as the one most meaningful. *

      Some validation of the RNAseq and/or proteomics results would be an important addition to this study

      We agree with this notion and propose to validate key candidates related to cardiomyocyte proliferation and polyploidization, some of which we found to be differentially expressed at the mRNA level on day 7in the RNAseq data (e.g., p21, Foxm1, Kif18a, Lin37 and others)

      Regarding the proteomics results, we face the challenge that we can only try to confirm if candidate proteins are likely caspase substrates in silico using DeepCleave*, and potentially pick one or two candidates linked to CM differentiation for further analysis in vitro and in heterologous cell based assays (e.g. 293T cells), as no bona-fide ventricular cardiomyocyte cell lines exist. Primary postnatal CMs are extremely difficult to transfect, nor they proliferate without drug-treatment, or fail cytokinesis ex vivo. *

      Figure 4D: the authors make the conclusion that p21 is downstream of PIDD (et p53 independent). However, this is not supported by the data because the increase in 4N cells/decrease in 2N cells, although statistically significant, is nowhere near that of caspase-2 KO and caspase-2/p21 KO. Statistics should also compare p32KO with c2KO. In the absence of any other data, the more likely conclusion is that p21 is not involved.

      *We agree that the findings related to the impact seen upon loss of p21 suggest that it is not the only effector involved in ploidy control and it may not even be an effector engaged by caspase-2, as C2/p21 DKO mice have an even higher ploidy increase, albeit not statistically significant. However, it is important to highlight that p21 (Cdkn1a) was found to be downregulated in our transcriptomic analysis suggesting an involvement in the caspace2-cascade. We are happy to highlight this when presenting the results and in the discussion. *

      *We assume that this referee refers to p73 KO data that should be compared to Casp2 KO data (could be read as p73 or p53, but the latter we compare side by side with Casp2 in Fig. 4 already). As p73 KO mice were not found to be viable beyond day 7 (our attempt to find animals on day 10 failed, in line with published literature (PMID: 24500610, 10716451)), we can only offer to compare this data set to the data presented in Figure 3C, where we have analyzed ploidy increases on day 7 from wt and PIDDosome mutant mice. This re-analysis will show that only Caspase-2 mutant mice display a significant ploidy increase on P7, when compared to wt or p73 mutant animals, while no difference are noted between wt and p73 mutant mice (to be included in new Suppl. Fig. 3C) *

      Minor comments: Suggest moving Figure 4A to Figure 3 as it seems to fit better there based on the citation of this figure in the text

      *We can see some benefit in this recommendation and included panel 4A now in an updated version of Figure 3. *

      Recommend enhancing the brightness of microscopy images in Figure 1E and 2D

      We will try to improve image quality, may have been due to PDF conversion


      Significance

      This study provides interesting information for the role of the PIDDosome in protecting from polyploidy and adds to the body of work by this same group studying this pathway in the liver.

      The main weakness in terms of significance is the lack of a phenotype in the hearts of these animals. Therefore, it is clear that ploidy (or at least PIDDosome dependent ploidy) has minimal impact on cardiac development.

      We respectfully disagree with the comment that the lack of impact on cardiac function constitutes a weakness of our findings. Several studies on ploidy control in the liver (PMID 34228992) but importantly also heart (PMID: 36622904) have failed to document a clear impact of increased ploidy on organ function. This does not infer insignificance, but maybe rather that the context where this becomes relevant has not been identified. We are happy expand on this in our discussion

      • *

      The authors mention that they have not tried giving these mice an myocardial infarct (MI) or inducing any other type of cardiac damage. Although it is understood that these experiments are likely outside of the scope of the present study, without this information the impact of this study is moderate. I recommend expanding the discussion to provide a more in-depth possible rationale as to why ploidy perturbations do not lead to structural changes like in the liver.

      Despite this, the insights to the pathway itself are interesting to investigators in the caspase-2 field if a little underdeveloped, especially concerning the role of p21.

      My expertise is in cell death and caspase biology (especially caspase-2). I have sufficient expertise to evaluate all parts of this paper.

      *As mentioned above, we will amend our conclusions on p21, in light of potential findings made when validating DEG candidates, as stated above. *

      *We hope that the changes and amendments proposed here will be satisfactory to this referee to recommend publication of a revised manuscript. *

      • *


      Reviewer #2

      __Evidence, reproducibility and clarity: __

      __Summary: __

      In this study, the authors investigated the role of the PIDDosome during cardiomyocyte polyploidization. PIDDosome is a multi-protein complex activating the endopeptidase Caspase-2, and shown to be involved in eliminating cells with extra centrosomes or in response to genotoxic stress (Burigotto & Fava, 2021, Sladky and Villunger, 2020). In both cases, the PIDDosome is recruited in a ANKRD26-dependent manner at the centrosomes leading to p53 stabilization and cell death (Burigotto & Fava, 2021; Evans et al., 2020; Burigotto et al., 2021).

      Here, by studying mouse cardiomyocyte differentiation, the authors showed that PIDDosome is imposing ploidy restriction during cardiomyocyte differentiation. Importantly, in contrast to a previous report in the liver (Sladky et al., 2020), they showed that PIDDosome acts in a p53-independent manner in cardiomyocytes. Indeed, they suggested that PIDDosome controls ploidy in cardiomyocytes through p21 activation.

      We want to thank this reviewer for the time taken to evaluate our work and provide critical feedback that will help to improve our revised manuscript.

      __Major comments: __

      In general the conclusions of the authors are well supported by the experiments. However, I would suggest the following experiments/analysis to strengthen the paper:

      The authors should improve the Figure 1 to help the readers who are not familiar with cardiomyocyte polyploidization. For instance, I would suggest to add a scheme to summarize cardiomyocyte polyploidization (in terms of nuclear size, mono vs multi and so on).

      We agree that a visual summary of the postnatal timing of CM polyploidization will be helpful for the generalist not familiar with the topic and have added a scheme, adapted from a study by Alkass et al. (PMID: *26544945), who elegantly defined the timing of this process during postnatal mice life (now Fig. 1A). *

      Based on the images they presented in 1B, the authors should also measure the nuclear area or volume in the different conditions in which components of the PIDDosome were depleted. Indeed, these two parameters should be easier to conceptualize for the readers (instead of the fluorescence nuclear intensity). This could help to understand if the nuclear size is maintained between the different conditions and if this is comparable between mono, bi or multinucleated cardiomyocytes.

      We have acquired this data and it can be used to provide additional information on nuclear area and/or volume. We propose to focus on re-analyzing data from wt, Casp2 and XMLC2CRE/Casp2f/f mice. The additional information can be included in Figures 1 & 2, respectively.

      • In Figure 2A, the authors presented cross section of heart from animals showing that PIDDosome depletion has no effect on heart size. This is a surprising result since cardiomyocytes have higher ploidy levels and this could have an effect on their function. Since the importance of this observation, the authors should present a quantification of the heart size in the different conditions shown in Figure 2A.

      We agree with this comment. We can measure the heart vs. body weight ratio or tibia length in adult Casp2-/- vs. WT (3 month old) in order to indirectly evaluate possible increases in CM size linked to increased ploidy.

      Also, the percentage of cardiomyocytes presenting higher levels of ploidy seems quite low. The authors should discuss this point. In particular because this could explain the absence of consequences on heart size and function at steady state.

      We agree with this conclusion and will expand on this in our discussion. It is important to note that as opposed to findings made in liver (PMID: *31983631), genetic manipulation of ploidy regulators such as E2f7/8 (PMID: 36622904), only led to modest changes in CM ploidy, suggesting that either a small band-width compatible with normal heart function exists, or that additional mechanisms exist that take control when these thresholds set by the PIDDosome or E2f7/8 are exceeded. These mechanisms could involve Cyclin G (PMID: 20360255), or TNNI3K (PMID: 31589606). Importantly, a recent publication has shown that overexpression of Plk1(T210D) and Ect2 from birth causes increased heart weight coupled with a minor decrease in CM size. These mice undergo to premature death (PMID: 39912233) suggesting that CM polyploidization is a tight regulated process regulated by several independent mechanisms during heart development. *

      • *

      In Figure 2D, the authors measured the cardiomyocyte cross-sectional area and concluded that removing PIDDosome components have no effect on cardiomyocyte cell size. Since it has been shown that ploidy increase is normally associated with an increase in cell area, the authors should measure cell area of cardiomyocytes analyzed in Figure 1B. It could be then interesting to establish a correlation with nuclear area and the mono, bi or multinucleated status. This will strengthen the results showing that ploidy increases without affecting cell area.

      Indeed, studies in PIDDosome deficient livers suggest that tissue is containing fewer but bigger cells (PMID: *31983631). As opposed to the liver the percentage of cardiomyocytes presenting higher levels of ploidy is relatively low. Thus, a possible increase in CM size in PIDDosome deficient mice may be masked in heart cross-sections. In order to better correlate the ploidy with cell size, we propose to reanalyze our microscopy images used to extract the data displayed in Fig. 1D. We may run into the problem though that the number of cells acquired may become limiting to achieve sufficient statistical power. In this case we could pool data from different PIDDosome mutant CM to increase statistical power. Again, we propose to initially prioritize wt vs. Casp2 vs. XMLC2/Casp2f/f mice. In addition, we can offer to quantify heart to body weight ratio or tibia length as an additional read-out (see answer to a previous reviewer comment). *

      The authors should discuss the fact that PIDDosome depletion lead only to a mild increase in ploidy levels (4N) in a small percentage of cardiomyocyte. If the PIDDosome is controlling ploidy, one could expect that removing it should lead to a drastic increase in the ploidy levels. Is PIDDosome depletion leading to cell death in some cardiomyocyte? The authors should discuss this point in the discussion or if relevant show a staining with an apoptosis marker. Is another mechanism compensating to prevent higher ploidy levels in cardiomyocytes?

      These are valid thoughts, some of which we contemplated before. In part, we have addressed them in our response to Reviewer#1, above, discussing similar findings made in E2f7/8 deficient hearts (PMID: 36622904), or Cyclin G overexpressing hearts (PMID: 20360255), where also only modest changes in ploidy were achieved. Together these observations are suggesting alternative control mechanism able to act, or limited tolerance towards larger shifts in ploidy, incompatible with proper cell function and survival. Towards this end, we can offer to test if we find increased signs of cell death in PIDDosome mutant hearts by TUNEL staining of histological sections. Of note, we did not find evidence for such a phenomenon in the liver (PMID: 31983631).

      Even if the authors presented RNAseq data suggesting that the PIDDosome is activated during cardiomyocyte differentiation, they should clearly demonstrate this point to strengthen the message of the paper. Indeed, the conclusions are based on the absence of PIDDosome components triggering higher ploidy in cardiomyocytes. However, we don't know whether (and when) the PIDDosome is activated during cardiomyocyte differentiation to control their ploidy levels. I would suggest to analyze PIDDosome activation markers by immunofluorescence in *cardiomyocytes at different developmental stages. *

      *We agree with this referee that direct proof of PIDDosome activation would be helpful and that we only infer back from loss of function phenotypes when and where the PIDDosome becomes activated. However, several technical issues prevent us from collecting more direct evidence of PIDDosome activation in the developing heart. 1) Polyploidization in heart CM appears to happen gradually in CM from day 3 on with a peak at day 7 (PMID: 26544945). Hence, this is not a synchronous process, where we could pinpoint simultaneous activation of the PIDDosome in all cells at the same time, which would facilitate biochemical analysis, e.g., by western blotting for signs of Caspase-2 activation (i.e. the loss of its pro-form, PMID: 28130345). 2) Our most reliable readout, MDM2 cleavage by caspase-2 giving rise to specific fragments detectable in western, is not applicable to mouse tissue, as the antibody we use only detects human MDM2 (PMID: 28130345) and no other MDM2 Ab we tested gave satisfactory results. Independent of that, 3) we do not see involvement of p53 in CM ploidy control (arguing against a role of MDM2). *

      *As such, we can only offer to look at extra centrosome clustering in postnatal binucleated CM (as also suggested further below), as a putative trigger for PIDDosome activation. However, this has been published by the first author of this study before (PMID 31301302). Given that we have made the significant effort to time resolve the increase in ploidy in postnatal mice (please note that several hearts needed to be pooled for each time point, analyzed in multiple biological replicates), we think that our conclusions are well-justified based on the genetic data provided. *

      Concerning the methods, the authors must add the references for each product they used and not only the origin. When relevant, the RRID should be indicated. Without this information the method and the data cannot be reproduced.

      We will update this information where relevant to reproduce our results

      Minor comments:

      In general, the text and the figures are clear. Nevertheless, I would suggest the following changes:

      • Figures 1B, 2B and 2C: the y-axis must start at 0.

      We will adopt axes accordingly

      Figure 4A: The authors should stain centrosomes in cardiomyocytes. This should strengthen the conclusion taken by the authors based on the results obtained in mice depleted for ANKRD26. Indeed, for the moment they are insufficient to conclude about the role of the centrosomes. The authors should show that centrosomes cluster in cardiomyocytes (a condition necessary for PIDDosome activation in polyploid cells) and if possible that component of the PIDDosome are recruited here.

      *This point is well taken and addressed in part above. Clustering of extra centrosomes has been documented and published by the first author of this study in rat polyploid cardiomyocytes (PIMID; cited). We can offer to show clustering of centrosomes in mouse CM isolated from day 7 hearts, but while PIDD1 can be detected well in MEF, we repeatedly failed to stain fro PIDD1 in primary CMs. *

      Figure 4F: I would suggest to modify the working model to emphasize more the differences between WT and PIDDosome KO.

      We will aim to improve this cartoon/graphical abstract

      The prior studies are referenced appropriately.

      Reviewer #2 (Significance (Required)):

      How polyploid cells control their ploidy levels during differentiation remains poorly understood. The data presented here represent thus an advance concerning this question. The actual model concerning PIDDosome activation relies on the presence of extra centrosomes that drives the ANKDR26-dependent recruitment of the PIDDosome. Then, Caspase 2 is activated leading to a p53-p21 dependent cell cycle arrest (Burigotto & Fava, 2021, Sladky and Villunger, 2020; Janssens & Tinel, 2012; Evans et al., 2020; Burigotto et al., 2021). In this study, the authors showed that similar pathway takes place during cardiomyocyte differentiation to control ploidy levels. These data are reminiscent of previous work showing PIDDosome involvement during hepatocyte polyploidization (Sladky et al. 2020). Together, these data highlight the prominent role of the PIDDosome complex in controlling ploidy levels in physiological context. Importantly, this study identified that the classical p53-dependent cell cycle arrest described after PIDDosome activation is not involved here. Instead, the data established that independently of p53, p21 contribute to control cardiomyocyte ploidy. In consequence, this study extends the initial pathway associated with PIDDosome activation and suggest that other mechanisms could take place to restrain cell proliferation upon PIDDosome activation. Overall, this makes this paper significant and of interest for the following fields: polyploidy, heart/cardiomyocyte development and PIDDosome.

      My field of expertise includes polyploidy, cell cycle and genetic instability.

      We thank this reviewer for the time taken and the positive feedback provided.

      • *

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      N/A

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      • *As outlined above, limited tools are available to validate putative caspase-2 substrates, identified in proteomics analysis, in an impactful manner. *

      • *Also, as discussed above, we deem myocardial infarction experiments in mice as unsuitable to improve our work, as with all likely-hood, they will yield negative results. *
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the potential of targeting specific regions within the RNA genome of the Porcine Epidemic Diarrhea Virus (PEDV) for antiviral drug development. The authors used SHAPE-MaP to analyze the structure of the PEDV RNA genome in infected cells. They categorized different regions of the genome based on their structural characteristics, focusing on those that might be good targets for drugs or small interfering RNAs (siRNAs).

      They found that dynamic single-stranded regions can be stabilized by compounds (e.g., to form G-quadruplexes), which inhibit viral proliferation. They demonstrated this by targeting a specific G4-forming sequence with a compound called Braco-19. The authors also describe stable (structured) single-stranded regions that they used to design siRNAs showing that they effectively inhibited viral replication.

      Strengths:

      There are a number of strengths to highlight in this manuscript.

      (1) The study uses a sophisticated technique (SHAPE-MaP) to analyze the PEDV RNA genome in situ, providing valuable insights into its structural features.

      (2) The authors provide a strong rationale for targeting specific RNA structures for antiviral development.

      (3) The study includes a range of experiments, including structural analysis, compound screening, siRNA design, and viral proliferation assays, to support their conclusions.

      (4) Finally, the findings have potential implications for the development of new antiviral therapies against PEDV and other RNA viruses.

      Overall, this interesting study highlights the importance of considering RNA structure when designing antiviral therapies and provides a compelling strategy for identifying promising RNA targets in viral genomes.

      Weaknesses:

      I have some concerns about the utility of the 3D analyses, the effects of their synonymous mutants on expression/proliferation, a potentially missed control for studies of mutants, and the therapeutic utility of the compound they tested vs. Gquadruplexes.

      We thank the reviewer for their positive assessment and insightful comments. Below, we address each point of concern:

      (1) The utility of the 3D analyses:

      In the revised manuscript, we have toned down this discussion and moved Figure 3A to the supplementary materials to reduce any sense of fragmentation in the overall story. While SHAPE-MaP technology is mature and convenient to use and can indeed capture some RNA structural elements with special functions in certain case; we acknowledge that its application for 3D analyses requires further validation. We believe this approach will become more prevalent in future research.

      (2) The effects of synonymous mutants on expression/proliferation:

      In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Given that lysine has only two codons (AAG and AAA), the G3109A synonymous mutation represented our sole viable option. Published studies (Ding et al., 2024) confirm that neither AAG nor AAA are classified as rare or dominant codons in mammalian cells. Therefore, the observed changes in viral proliferation levels are likely to stem from alterations in RNA secondary structure rather than codon usage effects.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142. 

      (3) Potentially missed control for studies of mutants:

      In the revised manuscript, we have incorporated additional control experiments evaluating Braco-19's therapeutic effects on the PQS3 mutant strain (Figure 4 – figure supplement 3):

      (4) The therapeutic utility of Braco-19 vs. G-quadruplexes:

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data clearly show that not all PQSs in the viral genome can form G4 structures. Our findings primarily provide proof-of-concept that sequences with high G4-forming potential in viral genomes represent viable targets for antiviral therapy. Future studies could leverage SHAPEguided structural insights to design ligands with enhanced specificity for viral G4s, potentially improving therapeutic utility while minimizing off-target effects.

      Reviewer #2 (Public review):

      Summary:

      Luo et. al. use SHAPE-MaP to find suitable RNA targets in Porcine Epidemic Diarrhoea Virus. Results show that dynamic and transient structures are good targets for small molecules, and that exposed strand regions are adequate targets for siRNA. This work is important to segment the RNA targeting.

      Strengths:

      This work is well done and the data supports its findings and conclusions. When possible, more than one technique was used to confirm some of the findings.

      Weaknesses:

      The study uses a cell line that is not porcine (not the natural target of the virus).

      We thank the reviewer for their insightful comments and recognition of our study's value. The most commonly employed cell models for in vitro PEDV studies are monkey-derived Vero E6 cells and porcine PK1 cells. However, PEDV (particularly our strain) exhibits significantly lower replication efficiency in PK1 cells compared to Vero cells, and no cytopathic effects were observed in PK1 cells. In our preliminary attempts to perform SHAPE-MaP experiments using infected PK1 cells, the sequencing data showed less than 0.03% alignment to the PEDV genome, rendering subsequent analysis and downstream experiments unfeasible.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Luo et al. applied SHAPE-Map to analyze the secondary structure of the Porcine Epidemic Diarrhoea Virus (PEDV) RNA genome in infected cells. By combining SHAPE reactivity and Shannon entropy, the study indicated that the folding of the PEDV genomic RNA was nonuniform, with the 5' and 3' untranslated regions being more compactly structured, which revealed potentially antiviral targetable RNA regions. Interestingly, the study also suggested that compounds bound to well-folded RNA structures in vitro did not necessarily exhibit antiviral activity in cells, because the binding of these compounds did not necessarily alter the functions of the well-folded RNA regions. Later in the manuscript, the authors focus on guanine-rich regions, which may form G-quadruplexes and be potential targets for small interfering RNA (siRNA). The manuscript shows the binding effect of Braco-19 (a G-quadruplex-binding ligand) to a predicted G4 region in vitro, along with the inhibition of PEDV proliferation in cells. This suggests that targeting high SHAPE-high Shannon G4 regions could be a promising approach against RNA viruses. Lastly, the manuscript identifies 73 singlestranded regions with high SHAPE and low Shannon entropy, which demonstrated high success in antiviral siRNA targeting.

      Strengths:

      The paper presents valuable data for the community. Additionally, the experimental design and data analysis are well documented.

      Weakness:

      The manuscript presents the effect of Braco-19 on PQS1, a single G4 region with high SHAPE and high Shannon entropy, to suggest that "the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells" (lines 625-626). While the effect of Braco-19 on PQS1 is supported by strong evidence in the manuscript, the conclusion regarding the G4 region with high SHAPE and high Shannon entropy is based on a single target, PQS1.

      We thank the reviewer for their positive assessment of our methodology and dataset. We propose that dynamic RNA structures in high SHAPE-high Shannon regions, when stabilized by small molecules, can serve as viable targets for antiviral therapy. Gquadruplexes represent a characteristic type of such dynamic structures that compete with local stem-loop formations in the genome. While we identified seven highly conserved PQSs in the PEDV genome, only PQS1 was located within a high SHAPEhigh Shannon region. To further validate this concept, we have supplemented the revised manuscript with Thioflavin T (ThT) fluorescence turn-on assays (Figures 3D, 3E, and Figure 3 – figure supplement 6), which provide additional evidence for the differential G4-forming capabilities of PQSs across regions with distinct structural features.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Comments:

      (1) It could be valuable for the authors to spend some more effort comparing their approach to siRNA target discovery and design to current methods for siRNA design. It would be good to highlight which components are novel, and which might offer superior performance with respect to other existing methods.

      We thank the reviewer for highlighting this important point. In response, we have rewritten the relevant section in the discussion:

      “Our approach uniquely integrates in situ RNA structural data (SHAPE reactivity and Shannon entropy) to prioritize siRNA targets within stable single-stranded regions (high SHAPE reactivity, low Shannon entropy), which are experimentally validated as accessible in infected cells. This represents a significant departure from traditional siRNA design methods that rely primarily on sequence conservation, thermodynamic rules (e.g., Tuschl rules), or in vitro structural predictions (Ali Zaidi et al., 2023; Qureshi et al., 2018; Tang and Khvorova, 2024),which may not accurately reflect intracellular RNA accessibility. Bowden-Reid et al. designed 39 antiviral siRNAs against various SARS-CoV-2 variants based on sequence conservation, ultimately identifying 8 highly effective sequences (Bowden-Reid et al., 2023). Notably, five of these effective sequences targeted regions that were located in high SHAPE-high Shannon regions according to SARS-CoV-2 SHAPE datasets (Supplementary Table 8) (Manfredonia et al., 2020). This independent finding aligns perfectly with our conclusions and demonstrates that SHAPE-based siRNA design outperforms sequence/structureagnostic approaches, at least in terms of significantly improving antiviral siRNA screening efficiency. Given the growing availability of SHAPE datasets for numerous viruses, we are confident that our methodology will facilitate more precise design of antiviral siRNAs.”

      (2) The section targeting their discovered G4 structure with Braco-19 is interesting, particularly showing effects on viral proliferation; however, it's not clear to me how this compound could be used therapeutically against PEDV, as it is a non-selective binder of G4 structures. Their results are good support for the presence and functionality of a G4 structure in PEDV, but I don't see any strategy outlined in the manuscript on how this could be specifically targeted with Braco-19.

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data demonstrate that not all PQSs in the viral genome can form G4 structures under physiological conditions. Our results specifically show that Braco-19 exerts its anti-PEDV activity by targeting PQS1, which is located in a high SHAPE-high Shannon entropy region. This target specificity was further confirmed by the complete resistance of the PQS1mut strain (lacking G4-forming ability) to Braco-19 treatment in our in vitro assays. 

      Additionally, previous studies have reported that during rapid viral replication, viral RNA accumulates to levels that significantly exceed host RNA concentrations. This "concentration advantage" suggests that G4 ligands like Braco-19 would preferentially bind viral G4 structures over host targets, thereby enhancing their antiviral specificity in vivo. In summary, our data provide proof-of-concept that viral genomic regions with high G4-forming potential - particularly those in high SHAPE-high Shannon entropy regions - represent promising targets for antiviral therapy.

      (3) The section where they proposed 3D RNA structures based on sequence similarity feels "tacked on" and I don't see how it adds to the overall story. The authors identify a short RNA hairpin in the PEDV genome with some sequence similarity to the CPEB3 nuclease P4 hairpin. However, they don't provide any evidence that this motif functions in a similar way or that it's important for the virus's life cycle. They also don't explain how this similarity could be exploited for antiviral drug development. It's not clear whether targeting this motif would have any effect on the virus. It's interesting that these two sequences share nucleotides, but it's unlikely that they share any homology...perhaps they convergently evolved (or were captured), but the similarity could also be coincidental.

      We appreciate the reviewer's insightful observation regarding this section. While our intention was to demonstrate that flexible conformations in high SHAPE-high Shannon regions could potentially be targeted, we acknowledge that extensive discussion of these motifs' functions would exceed the scope of this study, resulting in some disconnection from the main narrative. In response to this valuable feedback, we have consequentially removed it from the manuscript.

      (4) The authors should consider the optimality of the synonymous mutation (G3109A) that they introduced, as G3109A could swap a rare codon for a more optimal one. Even though the protein sequence is unaffected, the translation rate (and ability to proliferate) could be very different due to altered codon optimality. Additionally, to show the inactivity of the PQS3 mutant, the Braco-19 treatment studies performed on the PQS1 mutants could be repeated with PQS3 - using this as a control for these experiments.

      We appreciate the reviewer's insightful comment regarding codon optimization. In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Since lysine has only two codons (AAG and AAA), the G3109A synonymous mutation was our only viable option. Published literature (Ding et al. 2024) confirms that neither AAG nor AAA are classified as either preferred or rare codons in mammalian cells. Therefore, this substitution should have minimal direct impact on translation efficiency. Compared to nonsynonymous mutations that would alter amino acid sequences, we believe this synonymous mutation represents the optimal approach for maintaining native protein function while introducing the desired structural modification.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142.

      In the revised version, we have added control experiments showing the inhibitory activity of Braco-19 against the PQS3 mutant strain (Figure 4—figure supplement 3C) and discussed it in the results section.

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus (Figure 4—figure supplement 3C), demonstrating the specificity of Braco-19's action on PQS1.”

      Minor Comments:

      (5) The authors' description of the Shannon Entropy could be improved. The current description makes it seem like the Shannon Entropy only provides information on base pairing, however, the Shannon entropy quantifies the uncertainty of structural states at each position and is calculated based on the probabilities of the different states (paired or unpaired) that a nucleotide can adopt.

      We have revised the description of Shannon entropy in the manuscript:

      "The pairing probability of each nucleotide derived from SHAPE reactivities was subsequently used to calculate Shannon entropy. Regions with high Shannon entropy may adopt alternative conformations, while those with low Shannon entropy correspond to either well-defined RNA structures or persistently single-stranded regions (MATHEWS, 2004; Siegfried et al., 2014)."

      (6) The overall writing of the manuscript is very good, but there are some minor grammatical issues throughout, e.g., here are some of the ones that I caught:

      a) Lines 71-3: "various types of RNA structures such as hairpin structure, RNA singlestrand, RNA pseudoknot and RNA G-quadruplex (G4)" - the examples should be plural and, rather than "hairpins" (or in addition), perhaps add "helixes" to be more generically correct(?).

      We have revised the relevant description: 

      "various types of RNA structures such as stem-loop structures (with double-helical stems), RNA single-strand, RNA pseudoknot and RNA G-quadruplex (G4)"

      b) Lines 74-5: "Of these, RNA G4 has shown considerable promise because of the high stability and modulation by small molecules" should be "Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules."

      We have revised the sentence:

      “Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules.”

      c) Line 76: "have" should be "has".

      We have revised the sentence.

      d) Lines 104-5 (and elsewhere): "frameshift stimulation element (FSE)" should be "frameshift stimulatory element (FSE)".

      We have revised the sentence.

      e) Lines 428-9: following the Manfredonia's methods" should be "following Manfredonia's method" or "following the Manfredonia method".

      We have made the appropriate edit.

      These edits ensure grammatical accuracy and consistency with standard scientific terminology. We appreciate the reviewer's attention to detail, which has significantly improved the clarity of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) There are some important references missing, on shape-seq from Julius Lucks.

      We have added citations to the foundational work by Lucks et al. (2011, PNAS) that pioneered in vitro RNA structure probing using SHAPE-seq.

      (2) Describe the acronym "SHAPE",

      We have now included the full name of SHAPE:“Selective 2’-Hydroxyl Acylation and Primer Extension”.

      (3) Line 81: 2"-hydroxyl-selective - the prime is incorrect.

      We thank the reviewer for catching this technical error. We have corrected "2"hydroxyl" to "2'-hydroxyl".

      (4) Explaining a bit better how shape reagent works would be beneficial (one sentence should suffice).

      We have revised the Introduction section:

      “SHAPE reagents like NAI selectively modify flexible, unpaired 2′-OH groups in RNA, and these modifications are detected as mutations during reverse transcription, enabling precise mapping of RNA secondary structures through sequencing.”

      (5) Line 128: cite the paper that introduced NAI.

      We have now properly cited the original publication introducing NAI(Spitale et al., 2012).

      (6) Line 243: Can you describe what the compound is?

      The compound is Braco-19. This has now been included in the methods section. 

      (7) Line 272: describe what 3Dpol is and the source of it.

      We have supplemented the relevant information as follows:

      "3Dpol (recombinant RNA-dependent RNA polymerase; Abcam, ab277617, 0.02 mg/reaction)"

      (8) Figure 1 legend: For both C and D, the explanation of the G4 structure and the RISC complex should be added, otherwise, it becomes unclear why they are there.

      We have revised the captions for Figure 1 as follows:

      "(A) Well-folded regions (low SHAPE reactivity and low Shannon entropy; 26.40% of genome). These regions represent stably folded RNA structures with minimal conformational flexibility, likely serving as structural scaffolds or functional elements in viral replication. (B) Dynamic structured regions (low SHAPE reactivity and high Shannon entropy; 11.70% of genome). These conformationally plastic domains likely mediate regulatory switches between alternative secondary structures during infection. (C) Dynamic unpaired regions (high SHAPE reactivity and high Shannon entropy; 26.90% of genome). These regions are prone to form non-canonical nucleic acid structures (e.g., G-quadruplexes), which can be stabilized by small-molecule ligands to inhibit viral replication. (D) Persistent unpaired regions (high SHAPE reactivity and low Shannon entropy; 9.67% of genome). These regions are more accessible for siRNA binding, facilitating recruitment of Argonaute proteins and Dicer to form the RNAinduced silencing complex (RISC) for targeted cleavage."

      (9) Figure S2 panel A should be in Figure 1. This is a nice picture showing the backbone of the research.

      In the revised manuscript, we have reorganized Figure 1 and Figure S2 by incorporating the SHAPE-MaP workflow diagram (previously Figure S2A) into Figure 1 as panel (A): 

      (10) Please add the citation to Braco-19.

      We have now added the appropriate citation for Braco-19 (Gowan et al., 2002) in the revised manuscript.

      (11) Figure 5 legend: could you add in parenthesis the what ds means (and call Figure S28).

      We appreciate the reviewer's attention to detail. In the revised manuscript, we have clarified the abbreviations in the Figure 5 legend: ss (single-stranded targeting siRNAs); ds (dual-stranded targeting siRNAs). 

      (12) Line 107: I would argue that the "stabilization of a G4" inhibited viral proliferation. And that supports the point of the paper, that a small molecule that stabilizes the G4 can be used to reduce viral replication. I suggest emphasizing this thorough the paper.

      We fully concur with the reviewer's insightful perspective. In the revised manuscript, we have comprehensively strengthened the point of 'G4 stabilization' as an antiviral mechanism through the following enhancements:

      (1) In the Results section: We present Thioflavin T (ThT) fluorescence assays demonstrating the G4-forming capability of PQSs in the full-length PEDV genomic RNA context:

      “These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context.”

      (2) In the Results section: The inclusion of Braco-19 inhibition assays using PQS3 mutant virus as control provides robust evidence that Braco-19 exerts its antiviral effects specifically through PQS1 stabilization:

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco-19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus, demonstrating the specificity of Braco-19's action on PQS1.”

      (3) In the Discussion section: We have rewritten the mechanistic interpretation to emphasize: 

      "Crucially, Braco-19 showed no inhibitory activity against the PQS1-mutant strain while maintaining potent activity against the PQS3-mutant strain (Figure 4E, Figure 4—figure supplement 3C). This suggests that the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells." 

      (13) For PQS1, it's suggested that it is indeed a competing and transient conformation that forms the G4. I wonder if using an extended PQS1 (perhaps what is shown in Figure 3E) and using fluorescence, and/or K+ vs Li+, and/or in-vitro SHAPE could tell us more about this dynamic structure. Thioflavin T or any other fluorescent molecule that binds to G4s could be easily used to show how the formation of G4 may happen or not. In addition, how Braco-19 could really lock the dynamic structure in-vitro as well. I think the field would benefit from a deeper investigation of it.

      To address the dynamic competition between G4 and alternative RNA conformations, we performed Thioflavin T (ThT) fluorescence turn-on assay (now in Figure 3D-E and Figure 3—figure supplement 6) under physiological K<sup>+</sup> conditions (100 mM), with PRRSV-G4 RNA as a positive control. This reads as:

      “To validate whether SHAPE analysis could reflect the competitive conformational folding of PQSs in the PEDV genome, we performed in vitro transcription to obtain local intact structures containing PQSs within dynamic single-stranded regions and stable double-stranded regions (Table S6). Thioflavin T (ThT) fluorescence turn-on assays were conducted under physiological K<sup>+</sup> conditions (100 mM), with the G4 sequence of porcine reproductive and respiratory syndrome virus (PRRSV) serving as a positive control (Control-G4)(Fang et al., 2023). The results demonstrated that for short PQSs sequences containing only G4-forming motifs (Table S7), PQS1, PQS3, PQS4, and PQS6 all induced significant ThT fluorescence enhancement (Figure 3D-E, Figure 3—figure supplement 6), confirming their ability to form G4 structures. However, in long RNA fragments encompassing PQSs and their flanking sequences, only PQS1 and PQS4 exhibited pronounced ThT fluorescence responses (Figure 3DE), whereas PQS2, PQS3, and PQS6 showed negligible signals (Figure 3E, Figure 3— figure supplement 6). Notably, the PQS1-long chain displayed the strongest fluorescence signal, while its mutant counterpart (PQS1mut-long chain) exhibited the lowest background fluorescence (Figure 3D). These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context. Therefore, PQS1 was selected for further structural and functional validation.”

      (14) Figure S29 is nice and informative. Consider moving it to the main text.

      We appreciate the reviewer's positive assessment of Figure S29. Now we have renamed this figure as "Figure 5—Supplement 2".

    1. Reviewer #1 (Public review):

      This is a very interesting paper addressing the hierarchical nature of the mammalian auditory system. The authors use an unconventional technique to assess brain responses -- functional ultrasound imaging (fUSI). This measures blood volume in the cortex at a relatively high spatial resolution. They present dynamic and stationary sounds in isolation and together, and show that the effect of the stationary sounds (relative to the dynamic sounds) on blood volume measurements decreases as one ascends the auditory hierarchy. Since the dynamic/stationary nature of sounds is related to their perception as foreground/background sounds (see below for more details), this suggests that neurons in higher levels of the cortex may be increasingly invariant to background sounds.

      The study is interesting, well conducted, and well written. I am broadly convinced by the results. However, I do have some concerns about the validity of the results, given the unconventional technique. fUSI is convenient because it is much less invasive than electrophysiology, and can image a large region of the cortex in one go. However, the relationship between blood volume and neuronal activity is unclear, and blood volume measurements are heavily temporally averaged relative to the underlying neuronal responses. I am particularly concerned about the implications of this for a study on dynamic/stationary stimuli in auditory cortical hierarchy, because the time scale of the dynamic sounds is such that much of the dynamic structure may be affected by this temporal averaging. Also, there is a well-known decrease in temporal following rate that is exhibited by neurons at higher levels of the auditory system. This means that results in different areas will be differently affected by the temporal averaging. I would like to see additional control models to investigate the impact of this.

      I also think that the authors should address several caveats: the fact that their measurements heavily spatially average neuronal responses, and therefore may not accurately reflect the underlying neuronal coding; that the perceptual background/foreground distinction is not identical to the dynamic/stationary distinction used here; and that ferret background/foreground perception may be very different from that in humans.

      Major points

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

    2. Author response:

      Reviewer #1:

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing, but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. We will highlight this point in the manuscript. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness.

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. In particular, it seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we  observed the contrary, with non-primary regions over-representing non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings.

      Furthermore, background sounds are not completely constant: they are still dynamic sounds, but their temporal modulation rates are usually faster (see Figure 3B). Similarly, neural responses to these two types of sounds are dynamic (see for example Hamersky et al., 2025, Figure 1).  Thus, we are not sure that blood volume would transform the responses to these types of sounds non-linearly.

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we show that tuning to temporal rates differ across regions and partly explains the differences in background invariance we observe. We think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for our fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans rely on different computational mechanisms.

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature, including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. However, we will emphasize the limits of this definition when introducing it, as well as in the discussion.

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, the point remains valid and is already raised in the discussion. We will emphasize this limitation in addition to the limitation of our definition of foregrounds and backgrounds.

      Reviewer #2:

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex. We will run analyses providing comparisons of basic signal properties across our different regions of interest. We note that this point would also be valid for the human fMRI data, for which we cannot run these controls. Nevertheless, this should not affect our analyses and results, which should be independent of local vascular density. First, we normalize the signal in each voxel before any analysis, so that the absolute strength of the signal, or blood volume in a given voxel, does not matter. Second, we do see sound-evoked responses in all regions (Figure S2) and only focus on reliable voxels in each region. Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results.

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, and the point is just to illustrate the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV  across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vector of size number of sounds per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per neuron. We finally average these matrices across all neurons. The fact that you see red squares demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (for e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We will modify the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We will add additional discussion on this point. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds.

      Reviewer #3:

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., sub-selecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. We will clarify these points by quantifying potential differences in prediction accuracy in both species and comment on those in the manuscript.

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion and will run an additional prediction using only the sounds presented in isolation. This will be included in the next version of the manuscript.

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent cross-species differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. We will indeed add further discussion of the  difference between ferrets and humans in foreground invariance in primary auditory cortex. In addition, while we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. We will add this point to our discussion.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a very interesting paper investigating the fitness and cellular effects of mutations that drive dihedral protein complex into forming filaments. The Levy group have previously shown that this can happen relatively easily in such complexes and this paper now investigates the cellular consequences of this phenomenon. The study is very rigorous biophysically and very surprisingly comes up empty in terms of an effect: apparently this kind of self-assembly can easily be tolerated in yeast, which was certainly not my expectation. This is a very interesting result, because it implies that such assemblies may evolve neutrally because they fulfill the two key requirements for such a trajectory: They are genetically easily accessible (in as little as a single mutation), and they have perhaps no detrimental effect on fitness. This immediately poses two very interesting questions: Are some natural proteins that are known to form filaments in the cell perhaps examples of such neutral trajectories? And if this trait is truly neutral (as long as it doesn't affect the base biochemical function of the protein in question), why don't we observe more proteins form these kinds of ordered assemblies.

      I have no major comments about the experiments as I find that in general very carefully carried out. I have two more general comments:

      1. The fitness effect of these assemblies, if one exists, seems very small. I think it's worth remembering that even very small fitness effects beyond even what competition experiments can reveal could in principle be enough to keep assembly-inducing alleles at very low frequencies in natural populations. Perhaps this could be acknowledged in the paper somewhere.
      2. The proteins used in this study I think were chosen such that they do not have an important function in yeast that could be disrupted by assembly This allows the effect of the large scale assemblies to be measured in isolation. If I deduced this correctly, this should probably be pointed out agin in this paper (I apologise if I missed this).
      3. The model system in which these effects were tested for is yeast. This organism has a rigid cell wall and I was wondering if this makes it more tolerant to large scale assemblages than wall-less eukaryotes. Could the authors comment on this?

      Minor points:

      In Figure 2D, what are the fits? And is there any analysis that rules out expression effects on the mutant caused by higher levels of the wild-type? The error bars in Figure 2E are not defined.

      Significance

      This is a remarkably rigours paper that investigates whether self-assembly into large structures has any fitness effect on a single celled organism. This is very relevant, because a landmark paper from the Levy group showed that many proteins are very close in genetic terms to forming such assemblies. The general expectation I think would have been that this phenomenon is pretty harmful. This would have explained why such filaments are relatively rare as far as we know. This paper now does a large number of highly rigours experiments to first prove beyond doubt that a range of model proteins really can be coaxed into forming such filaments in yeast cells through a very small number of mutations. Its perhaps most surprising result is that this does not negatively affect yeast cells.

      From an evolutionary perspective, this is a very interesting and highly surprising result. It forces us to rethink why such filaments are not more common in Nature. Two possible answers come to mind: First, it's possible that filamentation is not directly harmful to the cell, but that assembling proteins into filaments can interfere with their basic biochemical function (which was not tested for here).

      Second, perhaps assembly does cause a fitness defect, but one so small that it is hard to measure experimentally. Natural selection is very powerful, and even fitness coefficients we struggle to measure in the laboratory can have significant effects in the wild. If this is true, we might expect such filaments to be more common in organisms with small effective population sizes, in which selection is less effective.

      A third possibility is of course that the prevalence of such self-assembly is under-appreciated. Perhaps more proteins than we currently know assemble into these structures under some conditions without any benefit or detriment to the organism.

      These are all fascinating implications of this work that straddle the fields of evolutionary genetics and biochemistry and are therefore relevant to a very wide audience. My own expertise is in these two fields. I also think that this work will be exciting for synthetic biologists, because it proves that these kinds of assemblies are well tolerated inside cells.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have eliminated through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against cell elongation/transertion acting a predominant mechanism of nucleoid segregation.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In our revised manuscript, we clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate (Figure 1H and Figure 1 - figure supplement 5A), indicating that it cannot be the main driver.

      (3) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (4) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. It is not clear to us how such nucleoid dynamics could be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether). Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the Introduction and Results section, but we agree that this was not well explained. We have now put emphasis on the related experimental data (Figure 1H, Figure 1 – figure supplement 5A, ) and revised the text (lines 199 - 210) to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the integrity or potential of the membrane depletes cells of ATP; without ATP, gene expression is inhibited. However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we have clarified this point and added model simulations (Figure 7 – figure supplement 2) to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We have included new flow cytometry data of fluorescently labeled DNA to show that DNA replication is not impacted.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. 2021 PMID: 34675077 and Xiang et al. 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions. We model this effect by introducing a term in the free energy χ_np, which refers to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes, this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This has been clarified in the revised text, with the support of additional simulation results showing how the asymmetry in polysome distribution causes a compaction force (Figure 4A).

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      In our case, phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though it arises from entropic effects.

      In the revised manuscript, we now illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell (new Figure 4A).

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression. The model is able to recapitulate the two main phenotypes observed in experiments (Figure 7). These new simulation results have been added to the revised manuscript (Figure 7 – figure supplement 2).

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This is now clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We have added markers in Figure 1C to indicate the average start of cell constriction. This relative time from birth to division was estimated as described in the new Figure 1 – figure supplement 2. We have also indicated that cell birth and division correspond to the first and last images/timepoint in Figure 1B and C, respectively. The two-imensional average cell projections presented in Figure 3D also indicate the average timing of cell constriction, consistent with our analysis in Figure 1 – figure supplement 2.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020. "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point! We have revised the text to mention this work. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We have revised the text accordingly.

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the original discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. The manuscript has been revised accordingly. Furthermore, in the revised manuscript, we have included additional simulation results with three different diffusion coefficients that reflect different polysome sizes to show that different polysome species with less or more ribosomes give similar results (Figure 4 – figure supplement 4). This shows that the average polysome description in our model is sufficient.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). These points are now mentioned in the revised manuscript.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there are likely entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we have revised our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We could not come up with great ideas for visuals others than the schematics we already provide. However, we have revised the text to clarify our points and added a simulation result (Figure 4A) to help explain biophysical concepts.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern.

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol. We have revised the manuscript to add these data (illustrative images + a quantitative analysis) in Figure 4 – figure supplement 1.

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion could contribute to the nucleoid being peripherally localized in A22 cells. We have revised the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We have revised the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We have revised the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We have revised the manuscript to discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we have revised the text to discuss membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      Please see above.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we confirm that the cell elongation rate is indeed overall slower than the nucleoid segregation rate (see Figure 1 - figure supplement 5A where the subtraction of the cell elongation rate to the nucleoid segregation rate at the single-cell level leads to positive values).

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 5B) but were not highlighted in this context. We have revised the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression (Figure 7). In the revised manuscript, we have added simulation results showing that these nucleoid dynamics are predicted by our model (Figure 7 – figure supplement 2).

      Based on these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We have revised the text to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we have revised the text to mention transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we have considered an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<sub>P</sub> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We present these new simulation results in Figure 4 – figure supplement 4 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Does the polysome density correlate with the origins? If the majority of ribosomal genes are expressed near the origins,

      This is indeed an interesting point that we mention in the discussion. The fact that the chromosomal origin is surrounded by highly expressed genes (PMID: 30904377) and is located near the middle of the nucleoid prior to DNA replication (PMID: 15960977, 27332118, 34385314, 37980336) can only help the model that we propose by increasing the polysome density at the mid-nucleoid position.

      (2) Red lines in 3C are hard to resolve - can the authors make them darker?

      Absolutely. Sorry about that.

      Reviewer #2 (Recommendations for the authors):

      The authors use rifampicin treatment as a mechanism to trigger polysome disassembly and show this leads to homogenous RplA distribution. This is a really important experiment as it is used to link RplA localization to polysomes, and tp argue that RplA density is reporting on polysomes. Given rifampicin inhibits RNA polymerase, and given the only reference of the three linking rifampicin to polysome disassembly is the 1971 Blundell and Wild ref), it would perhaps be useful to more conclusively show that polysome depletion (as opposed to inhibition of mRNA synthesis, which is upstream of polysome assembly) by using an alternative compound more commonly linked to polysome disassembly (e.g., puromycin) and show timelapse loss of density as a function of treatment time. This is not a required experiment, but given the idea that RplA density reports on polysomes is central to the authors' interpretation, it feels like this would be a thing worth being certain of. An alternative model is that ribosomes undergo self-assembly into local storage depots when not being used, but those depots are not translationally active/lack polysomes. I don't know if I think this is likely, but I'm not convinced the rifampicin treatment + waiting for a relatively long period of time unambiguously excludes other possible mechanisms given the large scale remodeling of the intracellular environment upon mRNA inhibition. I 100% buy the relationship between ribosomal distribution and nucleoid segregation (and the ectopic expression experiments are amazing in this regard), so my own pause for thought here is "do we know those ribosomes are in polysomes in the ribosome-dense regions". I'm not sure the answer to this question has any bearing on the impact and importance of this work (in my mind, it doesn't, but perhaps there's a reason it does?). The way to unambiguously show this would really be to do CryoET and show polysomes in the dense ribosomal regions, but I would never suggest the authors do that here (that's an entire other paper!).

      We agree that mRNAs play a role, as mRNAs are major components of polysomes and most mRNAs are expected to be in the form of polysomes (i.e., in complex with ribosomes). In addition, as mentioned above, the enrichments of ribosome distribution are known to be associated with polysomes (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). This is also consistent with cryo-ET results that we actually published (see Figure S5, PMID: 34186018). We have added this information to the revised manuscript. Thank you for alerting us of this oversight.

      On line 320 the authors state "Our single-cell studies provided experimental support that phase separation between polysomes and DNA contributes to nucleoid segregation." - this comes pretty out of left field? I didn't see any discussion of this hypothesis leading up to this sentence, nor is there evidence I can see that necessitates phase separation as a mechanistic explanation unless we are simply using phase separation to mean cellular regions with distinct cellular properties (which I would advise against). If the authors really want to pursue this model I think much more support needs to be provided here, including (1) defining what the different phases are, (2) providing explicit description of what the attractive/repulsive determinants of these different phases could be/are, and (3) ruling out a model where the behavior observed is driven by a combination of DNA / polysome entanglement + steric exclusion; if this is actually the model, then being much more explicit about this being a locally arrested percolation phenomenon would be essential. Overall, however, I would probably dissuade the authors from pursuing the specific underlying physics of what drives the effects they're seeing in a Results section, solely because I think ruling in/out a model unambiguously is very difficult. Instead, this would be a useful topic for a Discussion, especially couched under a "our data are consistent with..." if they cannot exclude other models (which I think is unreasonably difficult to do).

      Thank you for your advice. We have revised the text to more carefully choose our words and define our terms.

      Minor comments:

      The results in "Cell elongation may also contribute to sister nucleoid migration near the end of the division cycle" are really interesting, but this section is one big paragraph, and I might encourage the authors to divide this paragraph up to help the reader parse this complex (and fascinating) set of results!

      We have revised this section to hopefully make it more accessible.

      Reviewer #3 (Recommendations for the authors):

      Technical Controls:

      The authors should conduct a photobleaching control to confirm that the perceived 'higher' brightness of new ribosomes at the mid-cell position is not an artefact caused by older ribosomes being photobleached during the imaging process. Comparing results at various imaging frequencies and intensities is necessary to address this issue.

      The ribosome localization data across 30 nutrient conditions (Figure 2, Figure 1 – figure supplement 6, Figure 2 – Figure supplement 1, Figure 2 – Figure supplement 3 and Figure 5) are from snapshot images, which do not have any photobleaching issue. They confirm the mid-cell accumulation seen by time-lapse microscopy. We have revised the text to clarify this point.

      Novelty of Experimental Measurements:

      While the scale of the study is unprecedented, claims of novelty (e.g., line 142) regarding ribosome-nucleoid segregation tracking are overstated. Similar observations have been made previously (e.g., Bakshi et al., 2012; Bakshi et al., 2014; Chai et al., 2014).

      Our apologies. The text in line 142 oversimplified our rationale. This has been corrected in the revised manuscript.

    1. As I was preparing to present the first iteration of this paper, I worried I might be attributing inaccurate feelings to her so I asked her how she felt about being labeled as a child with special needs. She fired back with no hesitation, "I hate it!"

      I think this paragraph is really true and powerful. We often think that "identity" is a label that others put on us, but in fact, we ourselves are constantly participating in, responding to, and even internalizing these labels to some extent. Lydia's sentence "I hate it!" really made me feel the conflict - she was given a label that "helped" her, but her feelings were not really understood. It is too easy for us to use words like "special needs" as neutral words, but ignore the oppression that may be brought to the person involved. I like the author's reminder that identity is complex and dynamic, not a fixed definition.

    2. I also want to point out that despite the many challenges we face, our lives are no doubt much easier than those without our many privileges of skin color, social class, and language:

      Sometimes the advantages we have are "invisible". Things like skin color, social class and language, which we may not pay much attention to in our daily life, do quietly influence our experiences at school, such as whether we are misunderstood or easily understood and supported by teachers. The author's admission of her privilege is not to deny the difficulties she is facing, but to present a more comprehensive and honest educational perspective. I think this kind of self-awareness is also very important in the school environment, especially for us students. Only by learning to recognize our own position can we better understand the situation of others.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Public review:

      In this study, Porter et al report on outcomes from a small, open-label, pilot randomized clinical trial comparing dornase-alfa to the best available care in patients hospitalized with COVID-19 pneumonia. As the number of randomized participants is small, investigators describe also a contemporary cohort of controls and the study concludes about a decrease of inflammation (reflected by CRP levels) aJer 7 days of treatment but no other statistically significant clinical benefit.

      Suggestions to the authors:

      • The RCT does not follow CONSORT statement and reporting guidelines

      We thank you for this suggestion and have now amended the order and content of the manuscript to follow the CONSORT statement as closely as possible.

      • The authors have chosen a primary outcome that cannot be at least considered as clinically relevant or interesting. AJer 3 years of the pandemic with so much research, why investigate if a drug reduces CRP levels as we already have marketed drugs that provide beneficial clinical outcomes such as dexamethasone, anakinra, tocilizumab and baricitinib.

      We thank the reviewer for bringing up this central topic. The answer to this question has both a historical and practical component. This trial was initiated in June of 2020 and was completed in June of 2021. At that time there were no known treatments for the severe immune pathology of COVID19 pneumonia. In June 2020, dexamethasone data came out and we incorporated dexamethasone into the study design. It took much longer for all other anti-inflammatories to be tested. Hence, our decision to trial an approved endonuclease was based purely on basic science work on the pathogenic role of cell-free chromatin and NETs in murine sepsis and flu models and the ability of DNase I to clear them and reduce pathology in these animal models. In addition, evidence for the presence of cell-free chromatin components in COVID-19 patient plasma had already been communicated in a pre-print. Finally, several studies had reported the anti-inflammatory effects of dornase treatment in CF patients. Hence there was a strong case for a cheap, safe, pulmonary noninvasive treatment that could be self-administered outside the clinical se]ng.

      The Identification of novel/repurposed treatments effective for COVID-19 were hampered by patient recruitment to competing studies during a pandemic. This resulted in small studies with inconclusive or contrary findings. In general, effective treatments were only picked up in very large RCTs. For example, demonstrating dexamethasone as effective in COVID-19 required recruitment of 6,425 patients into the RECOVERY study. Multiple trials with anti-IL-6 gave conflicting evidence until RECOVERY recruited 4116 adults with COVID-19 (n=2022, tocilizumab and 2094, control) similar for Baracitinib (4,148 randomised to treatment and 4,008 to standard care). Anakinra is approved for patients with elevated suPAR, based on data from one randomized clinical trial of 594 patients, of whom 405 had active treatment (PMID: 34625750). However, a systematic review analysing over 1,627 patients (anakinra 888, control 739) with COVID-19 showed no benefit (PMID: 36841793). Regarding the choice of the primary endpoint, there is a wealth of clinical evidence to support the relevance of CRP as a prognostic marker for COVID-19 pneumonia patients and it is a standard diagnostic and prognostic clinical parameter in infectious disease wards. This choice in March 2020 was supported by evidence of the prognostic value of IL-6; CRP is a surrogate of IL-6. We also provide our own data from a large study of severe COVID-19 pneumonia in figure 1, showing how well CRP correlates with survival.

      In summary, our data suggest that Dornase yields an anti-inflammatory effect that is comparable or potentially superior to cytokine-blocking monotherapies at a fraction of the cost and potentially without the additional adverse effects such as the increase for co-infections.

      We now provide additional justification on these points in the introduction on pg.4 as follows:

      “The trial was ini.ated in June 2020 and was completed in September of 2021. At the start of the trial only dexamethasone had been proven to benefit hospitalized COVID-19 pneumonia pa.ents and was thus included in both arms of the trial. To increase the chance of reaching significance under challenging constraints in pa.ent access, we opted to increase our sample size by using a combina.on of randomized individuals and available CRP data from matched contemporary controls (CC) hospitalized at UCL but not recruited to a trial. These approaches demonstrated that when combined with dexamethasone, nebulized DNase treatment was an effec.ve an.-inflammatory treatment in randomized individuals with or without the implementa.on of CC data.”

      We also added the following explanation in the discussion on pg. 16:

      “Our study design offered a solution to the early screening of compounds for inclusion in larger platform trials. The study took advantage of frequent repeated measures of quantifiable CRP in each patient, to allow a smaller sample size to determine efficacy/futility than if powered on clinical outcomes. We applied a CRP-based approach that was similar to the CATALYST and ATTRACT studies. CATALYST showed in much smaller groups (usual care, 54, namilumab, 57 and infliximab, 35) that namilumab that is an antibody that blocks the cytokine GM-CSF reduced CRP even in participants treated with dexamethasone whereas infliximab that targets TNF-α had no significant effect on CRP. This led to a suggestion that namilumab should be considered as an agent to be prioritised for further investigation in the RECOVERY trial. A direct comparison of our results with CATALYST is difficult due to the different nature of the modelling employed in the two studies. However, in general Dornase alfa exhibited comparable significance in the reduction in CRP compared to standard of care as described for namilumab at a fraction of the cost. Furthermore, endonuclease therapies may prove superior to cytokine blocking monotherapies, as they are unlikely to increase the risk for microbial co-infections that have been reported for antibody therapies that neutralize cytokines that are critical for immune defence such as IL-1β, IL-6 or GM-CSF. “

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      This information is provided in the analysis on pg. 8:

      “The primary outcome was the least square (LS) mean CRP up to 7 days or at hospital discharge whichever was sooner.”

      • Why day 35 was chosen for the read-out of the endpointt?

      We now state on pg. 8 that “Day 35 was chosen as being likely to include most early mortality due to COVID-19 being 4 weeks after completion of a week of treatment. ( i.e. d7 of treatment +28 (4 x 7 days))”

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      We initially aimed at a fully randomized trial. However, the swiJ implementation of trial prioritization strategies towards large and pre-established trial plamorms in the UK made the recruitment COVID19 patients to small studies extremely challenging. Thus, we struggled to gain access to patients. Our power calculations suggested that a mixed trial with randomized and contemporary controls was the best way forward under these restrictions in patient access that could provide sufficient power.

      That being said, we also provide the primary endpoint (CRP) results in Fig. 3B as well as the results for the length of hospitalization (Fig. S3D) for the randomized subjects only.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      We apologize if this point was confusing. We performed the analysis on the ITT as defined in our SAP: “The primary analysis population will be all evaluable patients randomised to BAC + dornase alfa or BAC only who have at least one post-baseline CRP measurement, as well as matched historical comparators.”

      We understand that the reason this might be mistaken as an mITT is because the N in the ITT (39) doesn’t match the number randomised and because we had stated on pg. 8 that “ Efficacy assessments of primary and secondary outcomes in the modified inten.on-to-treat popula.on were performed.”

      However, we did randomise 41 participants, but:

      One participant in the DA arm never received treatment. The individual withdrew consent and was replaced. We also have no CRP data for this participant in the database, so they were unevaluable, and we couldn’t include them in the baseline table even if we wanted to. In addition, 1 participant in BAC only had a baseline CRP measurement available. Hence not evaluable as we only have a baseline CRP measurement for this participant.

      We have corrected the confusing statement on pg. 8 and added an additional explanation.

      “Efficacy assessments of primary and secondary outcomes in the inten.on-to-treat (ITT) popula.on were performed on all randomised par.cipants who had received at least one dose of dornase alfa if randomized to treatment. For full details see Sta.s.cal Analysis Plan. The ITT was adjusted to mi.gate the following protocol viola.ons where one par.cipant in the BAC arm and one in the DA arm withdrew before they received treatment and provided only a baseline CRP measurement available. The par.cipant in the DA arm was replaced with an addi.onal recruited pa.ent. Exploratory endpoints were only available in randomised par.cipants and not in the CC. In this case, a post hoc within group analysis was conducted to compare baseline and post-baseline measurements.”

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      Our protocol pre-specified that the primary analysis population should have at least one postbaseline CRP measurement (pg. 13 of protocol). The patient that was excluded was one that initially joined the trial but withdrew consent after the first treatment but before the first post-treatment blood sample could be drawn. Hence, the pre-treatment CRP of this patient alone provided no useful information.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatments as BAT received by those patients except for dexamethasone.

      Table 1 includes all 39 patients plus 60 CCs.<br /> Table 2 shows additional treatments given for COVID-19 as part of BAC.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      One of the main criticisms we have encountered in this study has been the choice of the primary endpoint. The best way respond to these questions was to provide data to support the prognostic relevance of CRP in COVID-19 pneumonia from a separate independent study where no other treatments such as dexamethasone, anakinra or anti-IL6 therapies were administered. We think this is very useful analysis and provides essential context for the trial and the choice of the primary endpoint, indicating that CRP has good enough resolution to predict clinical outcomes.

      • Propensity-score selected contemporary controls may introduce bias in favor of the primary study analysis, since controls are already adjusted for age, sex and comorbidities.

      The contemporary controls were selected to best match the characteristics of the randomized patients including that the first CRP measurement upon admission surpassed the trial threshold, so we do not see how this selection process introduces biases, as it was blinded with regards to the course of the CRP measurements. Given that this was a small trial, matching for baseline characteristics is necessary to minimize confounding effects.

      • The authors do not clearly present numerically survivors and non-survivors at day 34, even though this is one of the main secondary outcomes.

      We now provide the mortality numbers in the following paragraph on pg. 13.

      “Over 35 days follow up, 1 person in the BAC + dornase-alfa group died, compared to 8 in the BAC group. The hazard ra.o observed in the Cox propor.onal hazards model (95% CI) was 0.47 (0.06, 3.86), which es.mates that throughout 35 days follow-up, there was a 53% reduced chance of death at any given .mepoint in the BAC + dornase-alfa group compared to the BAC group, though the confidence intervals are wide due to a small number of events. The p-value from a log-rank test was 0.460, which does not reach sta.s.cal significance at an alpha of 0.05.”

      • It is unclear why another cohort (Berlin) was used to associate CRP with mortality. CRP association with mortality should (also) be performed within the current study.

      As we explained above, the Berlin cohort CRP data serve to substantiate the relevance of CRP as a primary endpoint in a cohort that experienced sufficient mortality as this cohort did not receive any approved anti-inflammatory therapy. Mortality in our COVASE trial was minimal, since all patients were on dexamethasone and did not reach the highest severity grade, since we opted to treat patients before they deteriorated further. The overall mortality was 8% across all arms of our study, which does not provide enough events for mortality measurements. In contrast the Berlin cohort did not receive dexamethasone and all patients had reached a WHO severity grade 7 category with mortality at 30%.

      My other concerns are:

      • This report is about an RCT and the authors should follow the CONSORT reporting guidelines. Please amend the manuscript and Figure 1b accordingly and provide a CONSORT checklist.

      We now provide a CONSORT checklist and have amended the CONSORT diagram accordingly.

      • Please provide in brief the exclusion criteria in the main manuscript

      We have now included the exclusion criteria in the manuscript on pg. 6.

      “1.1.1 Exclusion criteria

      1. Females who are pregnant, planning pregnancy or breasmeeding

      2. Concurrent and/or recent involvement in other research or use of another experimental inves.ga.onal medicinal product that is likely to interfere with the study medica.on within (specify .me period e.g. last 3 months) of study enrolment 3. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require mechanical invasive or non-invasive ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Any major disorder that in the opinion of the Inves.gator would interfere with the evalua.on of the results or cons.tute a health risk for the trial par.cipant

      4. Terminal disease and life expectancy <12 months without COVID-19

      5. Known allergies to dornase alfa and excipients

      6. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period So briefly Patients were excluded if they were:

      7. pregnant, planning pregnancy or breasmeeding

      8. Serious condition meeting one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Terminal disease and life expectancy <12 months without COVID-19

      4. Known allergies to dornase alfa and excipients

      5. Participants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period”

      • "The final trial visit occurred at day 35." "Analysis included mortality at day 35". I am not sure I understand why. In clinicaltrials.gov all endpoints are meant to be studies at day 7 except for mortality rate day 28. Why day 35 was chosen? Please be consistent.

      Thank you for identifying this inconsistency. We have amended the record on clinicaltrials.gov to read ‘’the time to event data was censored at 28 days post last dose (up to d35) for the randomised participants and at the date of the last electronic record for the CC.”

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      • Figure 1b as in CONSORT statement, please provide reasons why screened patients were not enrolled.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatment as BAT received those patients except for dexamethasone.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      • In Figure 2 the authors draw results about ITT although in methods describe that they performed an mITT analysis. Please be consistent.

      Please see answers provided to these queries above.

      Reviewer #2 (Recommendations For The Authors):

      1) Suppl Figure 2B would be more informative if presented as a Table with N of patients with per day sampling

      We now provide the primary end point daily sampling table in Table 3.

      2) The numbers at risk should figure under the KM curves

      The numbers at risk for figures 1E, 2C, 2D have been added as graphs either in the main figures or in the supplement.

      3) HD in Supplementary figure 3 should be explained

      We apologize for this omission. We now provide a description for the healthy donor samples that we used in the cell-free DNA measurements in figure S3B on pg. 14:

      “Compared to the plasma of anonymized healthy donors volunteers at the Francis Crick ins.tute (HD), plasma cf-DNA levels were elevated in both BAC and DA-treated COVASE par.cipants.

      4) Presentation is inappropriate for Table S4

      We thank the reviewer for pointing this issue. We have now formaxed Table S4 to be consistent with all other tables.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript is a focused investigation of the phosphor-regulation of a C. elegans kinesin-2 motor protein, OSM-3. In C-elegans sensory ciliary, kinesin-2 motor proteins Kinesin-II complex and OSM-3 homodimer transport IFT trains anterogradely to the ciliary tip. Kinesin-II carries OSM-3 as an inactive passenger from the ciliary base to the middle segment, where kinesin-II dissociates from IFT trains and OSM-3 gets activated and transports IFT trains to the distal segment. Therefore, activation/inactivation of OSM-3 plays an essential role in its ciliary function.

      Strengths:

      In this study, using mass spectrometry, the authors have shown that the NEKL-3 kinase phosphorylates a serine/threonine patch at the hinge region between coiled coils 1 and 2 of an OSM-3 dimer, referred to as the elbow region in ubiquitous kinesin-1. Phosphomimic mutants of these sites inhibit OSM-3 motility both in vitro and in vivo, suggesting that this phosphorylation is critical for the autoinhibition of the motor. Conversely, phospho-dead mutants of these sites hyperactivate OSM-3 motility in vitro and affect the localization of OSM3 in C. elegans. The authors also showed that Alanine to Tyrosine mutation of one of the phosphorylation rescues OS-3 function in live worms.

      Weaknesses:

      Collectively, this study presents evidence for the physiological role of OSM-3 elbow phosphorylation in its autoregulation, which affects ciliary localization and function of this motor. Overall, the work is well performed, and the results mostly support the conclusions of this manuscript. However, the work will benefit from additional experiments to further support conclusions and rule out alternative explanations, filling some logical gaps with new experimental evidence and in-text clarifications, and improving writing before I can recommend publication.

      We appreciate Reviewer #1’s comments and suggestions. We have now provided additional evidences and discussions to further support our conclusions and fill the logical gaps. We have also provided alternative explanations to our data and improved writing.

      Reviewer #2 (Public review):

      Summary:

      The regulation of kinesin is fundamental to cellular morphogenesis. Previously, it has been shown that OSM-3, a kinesin required for intraflagellar transport (IFT), is regulated by autoinhibition. However, it remains totally elusive how the autoinhibition of OSM-3 is released. In this study, the authors have shown that NEKL-3 phosphorylates OSM-3 and releases its autoinhibition.

      The authors found NEKL-3 directly phosphorylates OSM-3 (although the method is not described clearly) (Figure 1). The phophorylated residue is the "elbow" of OSM-3. The authors introduced phospho-dead (PD) and phospho-mimic (PM) mutations by genome editing and found that the OSM-3(PD) protein does not form cilia, and instead, accumulates to the axonal tips. The phenotype is similar to another constitutive active mutant of OSM-3, OSM-3(G444A) (Imanishi et al., 2006; Xie et al., 2024). osm-3(PM) has shorter cilia, which resembles with loss of function mutants of osm-3 (Figure 3). The authors did structural prediction and showed that G444E and PD mutations change the conformation of OSM-3 protein (Figure 3). In the single-molecule assays G444E and PD mutations exhibited increased landing rate (Figure 4). By unbiased genetic screening, the authors identified a suppressor mutant of osm-3(PD), in which A489T occurs. The result confirms the importance of this residue. Based on these results, the authors suggest that NEKL-3 induces phosphorylation of the elbow domain and inactivates OSM-3 motor when the motor is synthesized in the cell body. This regulation is essential for proper cilia formation.

      Strengths:

      The finding is interesting and gives new insight into how the IFT motor is regulated.

      Weaknesses:

      The methods section has not presented sufficient information to reproduce this study.

      We appreciate that Reviewer #2 is also positive to our study. We have now provided sufficient information in the revised Methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Concerns

      (1) Why do the authors think that NEKL-3 phosphorylates OSM-3 in the first place? This seems to come out of nowhere and prior evidence indicating that NEKL-3 may be phosphorylating OSM-3 is not even mentioned in the Introduction.

      We thank the Reviewer for raising this important point. Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM-3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “...In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26:  

      “... To determine whether NIMA kinase family members could directly phosphorylate

      OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region ...”

      (2) The authors need to characterize the proteins they expressed and purified for in vitro ATPase and motility assays. Are these proteins monomers or dimers?

      For our in vitro ATPase and motility assays, OSM-3 was expressed in E. coli BL21(DE3) and purified using established protocols (Xie et al., EMBO J, 2024, PMID: 38806659; Imanishi et al., JCB, 2006, PMID: 17000874). To confirm its oligomeric state, we analyzed recombinant OSM-3 by size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS). As reported in Xie et al. (2024), OSM-3 (~80 kDa monomer) elutes with a molecular weight of 173–193 kDa under physiological buffer conditions, consistent with a homodimeric assembly. These findings confirm that the functional unit used in our assays is the biologically relevant dimer. This characterization has been added to the revised manuscript on Page #35, Line #7.

      “…OSM-3 was expressed in E. coli BL21(DE3) and purified for in vitro assays using established protocols (REFs). Size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS) (Xie et al., EMBO J., 2024) confirmed that recombinant OSM-3 forms a homodimer (173–193 kDa) under physiological conditions, ensuring its dimeric state remained intact....” 

      (3) The authors primarily used PD and PM mutations, which affect all four amino acids in the region. This may or may not be physiologically relevant. Figure 5 indicates that T489 is a critical regulatory site. However, this conclusion is undermined by reliance on PD mutations, which affect all four amino acids. Creating PM (T489E) and PD (T489A) mutations based on WT OSM-3 would better reflect physiological relevance. In vitro assays with a single phosphomimic or phosphor-dead mutation at residue 489 are missing at the end of this story. This would better link Figure 5 with the rest of the manuscript.

      We thank the reviewer for this constructive critique. Below, we address the concerns and integrate new data to strengthen the link between T489 and autoinhibition:

      To probe the regulatory role of T489 phosphorylation, we generated osm-3(T489E) (phosphomimetic, PM) and osm-3(T489A) (phospho-dead, PD) mutant animals. Strikingly, both mutants formed axonal puncta (Figure S7), recapitulating the hyperactive phenotype of the OSM-3G444E mutant. While the similar puncta formation in PM and PD mutants initially appeared paradoxical, this observation underscores the necessity of dynamic phosphorylation cycling at T489 for proper autoinhibition. Specifically, the PD mutant (T489A) likely disrupts phosphorylationdependent autoinhibition stabilization, leading to constitutive activation, where as the PM mutant (T489E) may mimic a "locked" phosphorylated state, preventing dephosphorylation-dependent release of autoinhibition in cilia and trapping OSM-3 in an aggregation-prone conformation. These results highlight T489 as a structural linchpin whose post-translational modification dynamically regulates motor activity. While the precise molecular mechanism—such as how phosphorylation modulates tailmotor domain interactions—remains to be elucidated, our data conclusively demonstrate that perturbing T489 (even in isolation) destabilizes autoinhibition, driving puncta formation and the constitutive activity.

      We have integrated the above paragraph in the revised manuscript on page #8, line #27.

      (4) There seems to be a disconnect between the MT gliding assays in Figure 4C and single molecule motility assays in Figure 4E. The gliding assays show that all constructs can glide microtubules at near WT speeds. Yet, the motility assays show that WT and PM cannot land or walk on MTs. The authors need to explain why this is the case. Is this because surface immobilization of kinesin from its tail disrupts autoinhibition? Alternatively, the protein preparation may include monomers that cannot be autoinhibited and cannot land and processively walk on surface-immobilized microtubules (because they only have one motor domain) but can glide microtubules when immobilized on the surface from their tail.

      The surface immobilization of OSM-3 via its tail domain disrupts autoinhibition, a phenomenon previously observed in other kinesins such as kinesin-1 (Nitzsche et al, Methods Cell Biol., 2010, PMID: 20466139). In our assays, OSM-3 was nonspecifically immobilized on glass surfaces, enabling microtubule gliding by motors whose autoinhibition was relieved through tail anchoring. Critically, the PD and PM mutations reside in the tail region and do not alter the intrinsic properties of the motor head domain. Consequently, once autoinhibition is released via immobilization, the gliding velocities reflect the conserved motor head activity, which is expected to remain comparable across all constructs. While we cannot entirely rule out the presence of monomeric OSM-3 in solution, several lines of evidence argue against this possibility. First, the mutations are located in the elbow region, which is dispensable for motor dimerization. Second, SEC-MALS analysis from prior studies confirms that purified OSM-3 exists predominantly as dimers in solution. 

      We have discussed these issues in the revised text on page #10, line #18: 

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G). Although monomeric OSM-3 could theoretically mediate singlemotor gliding, the previous SEC-MALS data demonstrate that OSM-3 purifies as stable dimers (Xie et al., EMBO J, 2024, PMID: 38806659). Thus, dimeric OSM-3 is perhaps the predominant functional species in our assays…”

      (5) An alternative explanation for the data is that both PD and PM mutations result in loss-of-function effects, disrupting OSM-3 activity. For instance:

      a) In Figure 2C, both mutations cause shorter cilia than the wild type (WT).

      b) In Figure 4A, both mutations result in higher ATPase activity than WT.

      c) In Figure 4D, both mutations show increased gliding velocity compared to WT. These results suggest the observed effects could stem from loss of function rather than phosphorylation-specific regulation.

      Although PD and PM mutations exhibit superficially similar "loss-of-function" phenotypes in certain assays, they mechanistically disrupt motor regulation in distinct ways:

      a) Ciliary Length (Figure 2C) PD Mutants: Hyperactivation causes OSM-3-PD to prematurely aggregate into axonal puncta, preventing ciliary entry. Consequently, cilia are built solely by the weaker Kinesin-II motor, which only constructs shorter middle segments.

      PM Mutants: OSM-3-PM retains autoinhibition during transport (enabling ciliary entry) but cannot be dephosphorylated in cilia. This blocks activation, leaving OSM-3-PM partially functional and resulting in cilia intermediate in length between WT and PD.

      We have discussed this issue in the revised text on page #5, line #30:

      “…These findings indicate that OSM-3-PM is in an autoinhibited state capable of ciliary delivery, yet fails to achieve full activation due to defective dephosphorylation. This incomplete activation results in suboptimal motor function and intermediate ciliary length phenotypes (Fig.2 B-C). In contrast, OSM-3-PD exhibits constitutive activation leading to aggregation into axonal puncta, which completely abolishes its ciliary entry capacity (Fig.2 A-B)...”

      b) ATPase Activity (Figure 4A)

      PD Mutants: Fully autoinhibition-released (98.15% of KHC ATPase activity), consistent with constitutive activation.

      PM Mutants: Show partial ATPase activity (34.28% of KHC), reflecting imperfect phosphomimicry. While the DDEE substitution introduces negative charges, it fails to fully replicate the steric/kinetic effects of phosphorylated tyrosine (Y486; phenyl ring absent), resulting in incomplete autoinhibition stabilization. Despite this, the residual inhibition is sufficient to phenocopy shorter cilia in vivo.

      We have discussed this issue in the revised text on page #7, line#19:

      “…The PM mutant’s partial ATPase activity (34.28% of KHC) might arise from imperfect phosphomimicry—while the DDEE substitution introduces negative charges, it lacks the steric bulk of phosphorylated tyrosine (pY487). And this incomplete mimicry allows residual autoinhibition, sufficient to limit ciliary construction in vivo...”

      c) Microtubule Gliding Velocity (Figure 4D)

      Gliding Assay Limitation: Tail immobilization artificially releases autoinhibition, masking regulatory differences. Thus, all constructs (PD, PM) exhibit similar velocities (~0.7 µm/s), reflecting conserved motor head activity.

      Single-Molecule Assay (Figure 4E): Directly resolves native autoinhibition states:

      PD mutants show robust motility (autoinhibition released).

      PM mutants remain largely inactive (autoinhibition retained).

      We have discussed this issue in the revised text on page #10, line#18:

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G)...”

      Minor Suggestions and Concerns

      (1) Lines 60-66: References that support these observations are missing from this section.

      We have added the relevant references.

      (2) Lines 66-67: I would revise this sentence as "It remains unclear how OSM-3 becomes enriched...".

      We have made the changes.

      (3) Line 85: The authors should describe how they perform these assays (i.e. recombinantly expressed NEKL-3 and OSM-3, are these C. elegans proteins, and which expression system was used...).

      We have described them in the main text and methods

      Page #4 line #26

      “...To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM-3 protein in order to perform in vitro phosphorylation assays...”

      Page #35 line#12

      “...Basically, point mutations was introduced in to pET.M.3C OSM-3-eGFP-His6 plasmid for prokaryotic expression. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 23°C with 0.2 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 20 mM imidazole, 10 mM bME, 0.5 mM ATP, 1 mM MgCl¬2, Complete Protease Inhibitor Cocktail (Roche)) and Ni-NTA beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2) and eluted with elute buffer (50 mM NaPO4 pH7.2, 250 mM NaCl, 500 mM imidazole, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2). Protein concentration was determined by standard Bradford assay. C elegans nekl-3 cDNA was cloned in to pGEX-6P GST vector and expressed in E. coli BL21 (DE3) and purified for in vitro phosphorylation assays. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 18°C with 0.5 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 1 mM DTT, Complete Protease Inhibitor Cocktail (Roche)) and GST beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 1 mM DTT) and eluted with elute buffer (50 mM NaPO4 pH7.2, 150 mM NaCl, 10 mM GSH, 1 mM DTT). Purified proteins were dialyzed against storge buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl). Protein concentration was determined by standard Bradford assay...”

      (4) Line 141: The first sentence of this paragraph lacks motivation. I would start this sentence with "To directly observe the effects of phosphor mutants in the elbow region in microtubule binding and motility of OSM-3, we...".

      We have made the change.

      (5) Figure 1B: The mass spectrometry data in Figure 1B lacks adequate explanation. The Methods section should detail the experimental protocol, data interpretation, and any databases used. Additionally, the manuscript should list all identified phosphorylation sites on OSM-3 to provide context, including whether Y487_T490 is the major site.

      We have provided the detailed experimental protocol, data interpretation, and databases used in methods. We have provided all identified sites as Appendix table S1.

      (6) Figure 1C: Is it possible to model the effect of PM and PD mutations using AlphaFold? The authors should also show PAE or pLDDT scores of their model.

      AlphaFold cannot well model the effect of mutants, but we conducted the Rosetta relax to capture their possible conformational changes, as shown in the revised Figure 3. We have provided PAE and pLDDT as a new figure, Figure S2.

      (7) Figure 2D: The unit for speed should use a lowercase "s" for seconds.

      We have fixed it.

      (8) Figure 3: I am not sure whether this figure stands for a main text figure on its own, as it is only a Rosetta prediction and is not supported by any experimental data. In addition, it remains unclear what the labels on the x-axis mean.

      We have updated the figure and explain the labels on the x-axis in Figure S4 to make it more reader-friendly.

      (9) Figure 4: NEKL-3-treated OSM-1 should be included as a positive control in the in vitro experiments.

      We suspect that the Reviewer asked for NEKL-3-treated OSM-3. 

      In our other study which has just been accepted by the Journal of Cell Biology, NEKL3-treated OSM-3 significantly reduced the affinity between OSM-3 motor and microtubules and showed very low ATPase activity. We have cited and discussed this in the revised text on page #10, line #28: 

      “…As demonstrated in our recent study (Huang et al., JCB, 2025, In press, attached), phosphorylation of OSM-3 by NEKL-3 at two distinct regions—Ser96 and the conserved "elbow" motif—differentially regulates its activity and localization. Phosphorylation at Ser96 reduces OSM-3’s ATPase activity and alters its ciliary distribution from the distal segment to a uniform localization, while elbow phosphorylation induces autoinhibition, retaining OSM-3 in the cell body. Strikingly, in vitro phosphorylation of OSM-3 by NEKL-3 significantly reduces its microtubulebinding affinity, likely arising from combined modifications at both sites. We propose a model wherein elbow phosphorylation ensures anterograde ciliary transport, while Ser96 phosphorylation fine-tunes distal segment targeting. This multistep regulation may involve distinct phosphatases to reverse phosphorylation at specific sites, a hypothesis warranting further investigation….”

      (10) Figure 4C, D, and F: The unit of velocity is wrong. The authors should use the same units they used in the table shown in Figure 4B.

      We have fixed these errors

      (11) Figure 4F: The velocity of PD is a lot lower than G444E. Therefore, it would be more appropriate to refer to PD as partially active, rather than hyperactive.

      We have made the change. 

      (12) Figure 5: There is too much genetics jargon on this figure (EMF, F2, 100%Dyf,...). How are the alleles numbered? Is it OK to refer to them as Alleles 1 and 2 for simplicity?

      According to the established C. elegans allele nomenclature, each worm allele has a unique number named after the lab code for identification. We have simplified the labels and updated the figure to make it more reader-friendly.

      (13) Figure 5E: A plot would be more reader-friendly than a table. Additionally, the legend for Fig. 5E mistakenly refers to it as "D."

      We have changed the table to a plot and fixed the mistakes. We thank the Reviewer for pointing them out.

      Reviewer #2 (Recommendations for the authors):

      (1) The model appears as if NEKL-3 induces dephosphorylation of OSM-3 (Figure 6). This is not consistent with the conclusions described in the Discussion and is confusing.

      We have updated the model figure and fixed the error.

      (2) It should be described why the authors hypothesized NEKL-3 phosphorylates OSM3. Was there genetic evidence? Did the authors screened cilia-related kinases? or Did the authors identify it incidentally? Providing this information would help readers to understand the context of the research.

      We appreciate both Reviewers for pointing out this issue. 

      Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM-3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “... In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26: 

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      (3) It is curious the authors have not addressed the cilia phenotype and the localization of OSM-3 in nekl-3 mutant. Regardless of whether these observations agrees with the proposed mechanisms, it is essential for the authors to show and discuss the cilia phenotype and OSM-3 localization in nekl-3 mutants.

      We thank the Reviewer for highlighting this critical point. Indeed, nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), precluding direct analysis of ciliary phenotypes. To bypass this limitation, we recently generated nekl-3 conditional knockouts (cKOs) in ciliated neurons (Huang et al., JCB, 2025 in press, attached). In these mutants, OSM-3—which is normally enriched in the ciliary distal segment—becomes uniformly distributed along the cilium. This redistribution correlates with premature activation of OSM-3-driven anterograde motility in the ciliary middle region, consistent with our proposed model where NEKL3 phosphorylation suppresses OSM-3 activity. We have now integrated this result and discussion into the revised manuscript, reinforcing the physiological relevance of NEKL-3-mediated regulation in ciliary transport. 

      Page #6 line #10

      “… While nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), conditional knockout (cKO) of nekl-3 in ciliated neurons (Huang et al., JCB, 2025 in press, attached) revealed its critical role in regulating OSM3 dynamics. In nekl-3 cKO animals, OSM-3—normally enriched in the ciliary distal segment—redistributed uniformly along the cilium, concomitant with premature activation of anterograde motility in the middle ciliary region. This phenotype aligns with our model wherein NEKL-3 phosphorylation suppresses OSM-3 activity, ensuring spatiotemporal regulation of IFT.…”

      (4) The methods section lacks some information, which is critical to reproducing this study.

      We have now provided detailed information in the methods section in the revised manuscript.

      (a) It is not described how the authors determined phosphorylation of OSM-3 by NEKL-3. In methods, nothing is described about the assay.

      We performed in vitro phosphorylation assays using recombinant OSM-3 and NEKL3 purified from bacteria. We then used LC-MS/MS for identification of phosphorylation sites. We have now updated the methods section to include all the information.

      Page #4 line #26

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      Page #36, line #19

      “In vitro phosphorylation assay 20 μM purified OSM-3 was incubated with 1 μM GST-NEKL-3 at 30 °C in 100 μL reaction buffer (50 mM Tris-HCl pH 8.0, 10 mM MgCl2, 150 mM NaCl, and 2 mM ATP) for 30 min. The reaction was terminated by boiling for 5 min with an SDS-sample buffer.

      Mass spectrometry

      Following NEKL-3 treatment, OSM-3 proteins were resolved by SDS-PAGE and visualized with Coomassie Brilliant Blue staining. Protein bands corresponding to OSM-3 were excised and subjected to digestion using the following protocol: reduction with 5 mM TCEP at 56°C for 30 min; alkylation with 10 mM iodoacetamide in darkness for 45 min at room temperature, and tryptic digestion at 37°C overnight with a 1:20 enzyme-to-protein ratio. The resulting peptides were subjected to mass spectrometry analysis. Briefly, the peptides were analyzed using an UltiMate 3000 RSLCnano system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific). We applied an in-house proteome discovery searching algorithm to search the MS/MS data against the C. elegans database. Phosphorylation sites were determined using PhosphoRS algorithm with manual validation of MS/MS spectra.”

      (b) The method of structural prediction by Alfafold2 and LocalColabFold needs clarification. In general, the prediction gives several candidates. How did the authors choose one of these candidates?

      We generated five candidate models and all of them showed similar conformation. We thus chose the model with the highest confidence. We have provided PAE and pLDDT as additional data in Figure S2 and discussed them in the revised text on, Page #4, line #32: 

      “...To gain structural insights from this motif, we employed LocalColabFold based on AlphaFold2 to predict the dimeric structure of OSM-3 (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). The highest-confidence model was selected for further analysis (Fig. 1C, Fig. S2)...”

      (c) The methods to predict conformational changes by introducing various point mutations are interesting (Figure 3). However, the methods require more detailed descriptions. In the current form, the manuscript only lists the tools used. The pipelines and parameters need to be described. This information is important because AlphaFoldbased predictions often give folded conformations because the training data are mainly composed of folded proteins. It is surprising that the methods applied here give open conformations induced by point mutations.

      We have described the pipelines in the revised Methods section on page#34, line#25: 

      “…OSM-3 model was predicted using LocalColabFold (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). Mutated proteins were designed by Pymol 2.6, choosing the rotamer of the mutated residues in G444E, PM and PD models with the least clash as the initial conformation. To predict mutation-induced conformational changes, the initial models were subjected to Pyrosetta (Chaudhury et al., 2010). The energies of pre-relaxed models were evaluated with Rosetta Energy Function 2015 (Alford et al., 2017), and then the relax procedure were applied to the models with default parameters to obtain the relaxed models visualized by Pymol to minimize the energy of these models. In detail, to obtain the relaxed models visualized by Pymol and minimize the energy of these models, the classic relax mover was used in the procedure mentioned above with default settings. The relax script has been uploaded to Github: https://github.com/young55775/RosettaRelax_for_OSM3...”

      (5) The authors have purified proteins. Do they show different properties in gel filtration that are consistent with the structural prediction? It is anticipated that open-form mutants are eluted from earlier than closed forms.

      We thank the reviewer for this insightful suggestion. Indeed, our recent study supported that the open-from of the active OSM-3 G444E mutation were eluted earlier than the wild-type closed form (Xie et al., EMBO J., 2024). While the current study did not perform gel filtration chromatography (SEC) to directly compare the hydrodynamic properties of the OSM-3 mutants, our functional assays provide robust evidence for conformational changes predicted by structural modeling. For example: ATPase activity assays revealed that the open-state mutants (e.g., G444E and PD muatnts) exhibited significantly enhanced enzymatic activity (Figure 4A), consistent with structural predictions of an active, destabilized autoinhibitory interface (Figure 3A). These functional readouts collectively validate the predicted structural states. While SEC could further corroborate these findings by distinguishing compact (closed) versus extended (open) conformations, we prioritized assays that directly link structural predictions to in vitro enzymatic activity and in vivo ciliary transport dynamics. Future studies incorporating SEC or cryo-EM will provide additional biophysical validation of these states.

      We have revised the text in the manuscript (Page #7, Lines #22): 

      “…Notably, the open-state OSM-3 mutants (e.g., G444E) displayed elevated ATPase activity, consistent with structural predictions of autoinhibition release (Fig. 3A, Fig. 4A) (Xie et al., 2024). While hydrodynamic profiling (e.g., SEC) could further resolve conformational states, our functional assays directly connect predicted structural changes to altered biochemical and cellular activity...”

      Minor point

      (1) Line 85 "MIMA kinase family" should be "NIMA kinase family".

      We have corrected the typo and appreciate that the Reviewer for pointing it out. 

      (2) M.S. and D.S. need to be defined in Figure 2D.

      We have updated the figures.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:<br /> This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens.

      Weaknesses:<br /> The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above.

      We would like to thank the reviewer for taking the time to review our work and for providing us with valuable feedback.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published studies using approaches for studying mammalian mast cell degranulation to examine amphibian mast cells. Notably, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents may not cross-react with their amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as divergent as frogs and mammals. Respectfully, while following up on these findings is possible, it would involve considerable additional work to find reagents that would detect amphibian mast cell contents.

      We would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which mammalian mast cells confer their immunological effects. While we agree that defining the biology of amphibian mast cell degranulation is important, we anticipate that since the anti-Bd protection conferred by enriching frog mast cells is seen after 21 days of enrichment, it is quite possible that degranulation may not be the central mechanism by which the mast cells are mediating this protection.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. We are presently exploring the role of the frog IL4 in the observed mast cell anti-Bd protection. Should we generate meaningful findings in this regard, we will add them to the revised version of this manuscript.

      We are also exploring the heparin content of frog mast cells and capacities of these cells to degranulate in vitro in response to compound 48/80. In addition, we are exploring in vivo mast cell degranulation via histology and avidin-staining. Should these studies generate significant findings, we will include them in the revised version of this manuscript.

      Per the reviewer’s suggestion, in our revised manuscript we also plan to include data showing whether Bd infections affect skin mast cell numbers and how rSCF injection impacts skin mast cell numbers in the context of Bd infections.

      In regard to how mast cells impact Bd infections and skin microbiomes, our data indicate that mast cells are augmenting skin integrity during Bd infections and promoting mucus production, as indicated by the findings presented in Figure 4A-C and Figure 5A-C, respectively. There are several mammalian mast cell products that elicit mucus production. In mammals, this mucus production is mediated by goblet cells while the molecular control of amphibian skin mucus gland content remains incompletely understood. Interleukin-13 (IL13) is the major cytokine associated with mammalian mucus production8, while to our knowledge this cytokine is either not encoded by amphibians or else has yet to be identified and annotated in these animals’ genomes. IL4 signaling also results in mucus production9 and we are presently exploring the possible contribution of the X. laevis IL4 to skin mucus gland filling. Any significant findings on this front will be included in the revised manuscript. Histamine release contributes to mast cell-mediated mucus production10, but as we outline above, several studies indicate that amphibian mast cells may lack histamine2, 3, 4, 5. Mammalian mast cell-produced lipid mediators also play a critical role in eliciting mucus secretion11 and our transcriptomic analysis indicates that frog mast cells express several enzymes associated with production of such mediators. We will highlight this observation in our revised manuscript.

      We anticipate that X. laevis mast cells influence skin integrity, microbial composition and Bd susceptibility in a myriad of ways. Considering the substantial differences between amphibian and mammalian evolutionary histories and physiologies, we anticipate that many of the mechanisms by which X. laevis mast cells confer anti-Bd protection will prove to be specific to amphibians and some even unique to X. laevis. We are most interested in deciphering what these mechanisms are but foresee that they will not necessarily reflect what one would expect based on what we know about mammalian mast cells in the context of mammalian physiologies.

      Reviewer #2 (Public Review):

      Summary:<br /> In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology.

      Strengths: <br /> The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:<br /> The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example:

      1. Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models.

      We thank the reviewer for this insightful suggestion. We are performing a further examination of skin granulocyte content during Bd infections and plan on including any significant findings in our revised manuscript.

      We predict that rSCF administration results in the accumulation of mast cells that are polarized such that they ablate the inflammatory response elicited by Bd infection. Mammalian mast cells, including peritonea-resident mast cells, express csf3r12, 13. Although the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain heterogenous leukocyte populations. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein. As such and in acknowledgement of the reviewer’s suggestion, we also think that the cells recruited by rCSF3 into the skin may include not only neutrophils but also mast cells. Possibly, these mast cells have distinct polarization states from those enriched by rSCF. While the lack of antibodies against frog neutrophils or mast cells has limited our capacity to address this question, we will attempt to reexamine by histology the proportions of skin neutrophils and mast cells in the skins of frogs under the conditions described in our manuscript. Any new findings in this regard will be included in the revised version of this work.

      2. Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We will score epithelial thickness under the distinct conditions described in our manuscript and present the quantified data in the revised paper.

      3. Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      Because there were no significant effects of mast cell enrichment at 7 days post Bd infection, we chose to look at the microbiome composition in a subsequent experiment at 10 days and 21 days post Bd infection, with 10 days being a bit more of a midway point between the initial exposure and day 21, when we see the effect on Bd loads. We will clarify this rationale in the revised manuscript.

      The enrichment of neutrophils in frog skins resulted in prompt (12 hours post enrichment) skin thickening (in absence of Bd infection) and increased frog Bd susceptibility by 7 days of infection. Conversely, mast cell enrichment stabilized skin mucosal and symbiotic microbial environment, presumably accounting at least in part for the lack of further Bd growth on mast cell-enriched animals by 21 days of infection. Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections. Because the central focus of this work was mast cells and not other granulocyte subsets, when we saw that rCSF3-recruited granulocytes adversely affected Bd infections at 7 days post infection, we did not pursue the kinetics of these responses further. We plan to explore the roles of inflammatory mediators and disparate frog immune cell subsets during the course of Bd infections, but we feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      4. Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Our results indicate that after 10 days of Bd infection, control Bd-challenged animals exhibited reduced microbial richness, while skin mast cell-enriched Bd-infected frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections14, and we anticipate that Bd-mediated disruption of microbial richness and composition facilitates host skin colonization by this pathogen. Control and mast cell-enriched animals had similar skin Bd loads at 10 days post infection. However, by 21 days of Bd infection the mast cells-enriched animals maintained their Bd loads to levels observed at 10 days post infection, whereas the control animals had significantly greater Bd loads. Thus, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. In other words, maintained microbial composition at 10 days of infection may be preventing additional Bd colonization/growth, as seen when comparing skins of control and mast cell-enriched frogs at 21 days post infection. By 21 days of infection, control animals rebounded from the Bd-mediated reduction in bacterial richness seen at 10 days. Considering that after 21 days of infection control animals also had significantly greater Bd loads than mast-cell enriched animals suggests that there may be a critical earlier window during which microbial composition is able to counteract _Bd_growth. 

      While the current draft of our manuscript has a paragraph to this effect (see below), we appreciate the reviewer conveying to us that our perspective on the relationship between skin mast cells and the kinetics of microbial composition and _Bd_loads could be better emphasized. We plan to revise our manuscript to include the above discussion points. 

      Bd infections caused major reductions in bacterial taxa richness, changes in composition and substantial increases in the relative abundance of Bd-inhibitory bacteria early in the infection. Similar changes to microbiome structure occur during experimental Bd infections of red-backed salamanders and mountain yellow-legged frogs15, 16. In turn, progressing Bd_infections corresponded with a return to baseline levels of _Bd-inhibitory bacteria abundance and rebounding microbial richness, albeit with dissimilar communities to those seen in control animals. These temporal changes indicate that amphibian microbiomes are dynamic, as are the effects of Bd infections on them. Indeed, Bd infections may have long-lasting impacts on amphibian microbiomes15. While Bd infections manifested in these considerable changes to frog skin microbiome structure, mast cell enrichment appeared to counteract these deleterious effects to their microbial composition. Presumably, the greater skin mucosal integrity and mucus production observed after mast cell enrichment served to stabilize the cutaneous environment during Bd infections, thereby ameliorating the Bd-mediated microbiome changes. While this work explored the changes in established antifungal flora, we anticipate the mast cell-mediated inhibition of Bd may be due to additional, yet unidentified bacterial or fungal taxa. Intriguingly, while mammalian skin mast cell functionality depends on microbiome elicited SCF production by keratinocytes17, our results indicate that frog skin mast cells in turn impact skin microbiome structure and likely their function. It will be interesting to further explore the interdependent nature of amphibian skin microbiomes and resident mast cells.

      5. The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find the schematic of the immune manipulation, Bd infection, and sample collection times below. We will include a figure like this in our revised manuscript.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:<br /> Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:<br /> Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the significance and utility of the findings presented in our manuscript.

      Weaknesses:<br /> A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We will revise our discussion to include this possible interpretation.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We appreciate the reviewer’s comment and would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology.

      We thank the reviewer for suggesting expanding our discussion to include potential management applications and potential mechanisms for regulating frog skin mast cells. While any content to these effects would be highly speculative, we agree that it may spark new interest and pave new avenues for research. To this end, our revised manuscript will include a paragraph to this effect.

      References:

      1. Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      2. Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      3. Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      4. Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      5. Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      6. Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      7. Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      8. Lai, H. & Rogers, D.F. New pharmacotherapy for airway mucus hypersecretion in asthma and COPD: targeting intracellular signaling pathways. J Aerosol Med Pulm Drug Deliv 23, 219-231 (2010).

      9. Rankin, J.A. et al. Phenotypic and physiologic characterization of transgenic mice expressing interleukin 4 in the lung: lymphocytic and eosinophilic inflammation without airway hyperreactivity. Proc Natl Acad Sci U S A 93, 7821-7825 (1996).

      10. Church, M.K. Allergy, Histamine and Antihistamines. Handb Exp Pharmacol 241, 321-331 (2017).

      11. Nakamura, T. The roles of lipid mediators in type I hypersensitivity. J Pharmacol Sci 147, 126-131 (2021).

      12. Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      13. Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      14. Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      15. Jani, A.J. et al. The amphibian microbiome exhibits poor resilience following pathogen-induced disturbance. ISME J 15, 1628-1640 (2021).

      16. Muletz-Wolz, C.R., Fleischer, R.C. & Lips, K.R. Fungal disease and temperature alter skin microbiome structure in an experimental salamander system. Mol Ecol 28, 2917-2931 (2019).

      17. Wang, Z. et al. Skin microbiome promotes mast cell maturation by triggering stem cell factor production in keratinocytes. J Allergy Clin Immunol 139, 1205-1216 e1206 (2017).

    1. When looking at who contributes in crowdsourcing systems, or with social media in generally, we almost always find that we can split the users into a small group of power users who do the majority of the contributions, and a very large group of lurkers who contribute little to nothing. For example

      This is interesting to see that a majority of people are lurkers because as someone you uses social media it seems like there are many people who contribute but this actually shocks me because I guess you have to look at it in the scale of numbers if there are 100 million users of an app even 10% of that is 10 million which is many users and thats why people may think that there are a lot of engagers when in fact a good chunk of us are just lurkers.

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you for your valuable comments, which helped us improve our manuscript. We will make the following modifications in the revised manuscript:

      (1) In the first paragraph of the Result section, we will provide a summary of trimeric G proteins in Ciona and explain how we focused on Gαs and Gαq in the initial phase of this study.

      We added a summary of trimeric G proteins in Ciona in the initial part of the Results section (page 6, line 23 to page 8, line 5). In this summary, we added the following sentence explaining the reason we focused on Gas and Gaq in the initial phase of this study: "Among them, we prioritized examining the Gα proteins having an excitatory function (Gαq and Gαs) rather than inhibitory roles since previous studies suggested that excitatory events like Ca<sup>2+</sup> transient and neuropeptide secretion occur when Ciona metamorphose."

      (2) As the reviewer 1 suggests, the polymodal roles of papilla neurons are interesting. Although we could not address this through functional analyses in this study, we will add a discussion regarding this aspect. The sentences will be something like the following:

      “The recent study (Hoyer et al., 2024) provided several lines of evidence suggesting that PSNs can serve as the sensors of several chemicals in addition to the mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. The use of G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis in the appropriate situation, both mechanically and chemically.”

      We added a discussion related to the recent publication by Hoyer and colleagues on page 23, lines 13-18: " A recent study[19] provided several lines of evidence suggesting that PNs can serve as the sensors of several chemicals in addition to mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis either mechanically or chemically according to the situation."

      (3) As both reviewers suggested, imaging cAMP on the backgrounds of some G protein knockdowns is essential, and we will conduct the experiments.

      We added the data on cAMP imaging in Gas, Gaq, and dvGai_Chr2 knockdown larvae in Supplementary Figure S4C-D and Figure 6E.

      (4) We carefully modify the text throughout the manuscript so that the descriptions suitably reflect the results.

      We modified the descriptions of experimental results so that the text reflects the results more precisely.

      Reviewer #1:

      Pg1 - need to add an additional '6' to the author list to clarify which two or more authors contributed equally.

      We added a 6 as suggested. Thank you for pointing this out.

      Pg3 - note that larval adhesive organ applies to not all benthic adults, but to benthic sessile adults this makes it sound like the adhesive organ can trigger metamorphosis but has that been shown? In Ciona or others? Need to specify the role of cells secreting adhesive, vs sensory cells that trigger metamorphosis?

      We divided the corresponding sentence into two to clearly state that adhesion and triggering metamorphosis are related but could be different events. Moreover, we modified the sentence to state that physical contact is one example of a cue triggering metamorphosis. We then added another example of a factor triggering metamorphosis—i.e., chemicals from the organisms surrounding the adherence site (page 3, lines 16-20 of the revised version):

      "Many marine invertebrates exhibit a benthic lifestyle at the adult stage[4]. Their planktonic larvae have an adhesive organ that secretes adhesives and adheres to a substratum. The cues associated with the adhesion, such as the physical contact with the substratum and a chemical from organisms surrounding the adherence site, can trigger their metamorphosis."

      Pg 4 - although mechanosensation is the focus here, could there also be chemoreception/chemoreceptors involved in Ciona metamorphosis? For example, Hoyer et al. 2024 (Current Biology 34(6):1168-1182) concluded that some palp sensory neurons were multimodal and could be both chemo- and mechano-sensory.

      We added statements about this recent finding in the Introduction and Discussion sections. In the Introduction (page 4, lines 16-18), however, we also stated that a mechanical stimulus can trigger metamorphosis in the lab without the need to supply these chemicals. This is to emphasize that the mechanical stimulus is the focus of this study. In the Discussion, we added a statement that G-protein signaling could also be used to receive the chemical stimuli (page 23, lines 13-18).

      Pg 6 - Before starting functional characterizations, it would be useful to give an overview (table?) of the G proteins found in papillae, and what receptor they are suspected of binding to, or if this is completely unknown, and which downstream pathways they likely activate. That is, to show some results about which G proteins are found in Ciona, and which are found in papillae. In this way, it will make more sense for readers when the Gai is suddenly introduced later, following the sections of Gaq and Gas.

      Thank you for your idea to improve the readability of this manuscript. In the initial part of the Results section (page 6, line 22 to page 8, line 5), we added descriptions of the repertoire of trimeric G-proteins in Ciona, including phylogenetic analyses, and expression in the papillae based on RNA-seq data, followed by the reason why we initially focused on Gaq and Gas. The data are displayed in Supplementary Figure S1. The phylogenetic analyses were modified from those shown in Supplementary Figure S5 of the previous version. We also added the general downstream activities of Gas, Gai and Gaq in the Introduction section (page 6, lines 10-12). Considering the contents, the general function of Ga12/13 was stated in the Results section (page 8, lines 2-3).

      We did not add the information about their partner receptors in this early section. This is because there are many candidates, and we could not pick some of them. Instead, we described our current suppositions about their possible partners in the Discussion (page 23, line 22 to page 24, line 19). However, we suspect that there are more candidates, and we wish to promote unbiased research in the future.

      Pg 9 - would be good to know the timing of this PF fluorescence increase and the timing of stimulation in the text here, relevant to the 30-min gap before metamorphosis initiation

      We added the start times for the cAMP reduction and re-upregulation in the following sentence (page 11, lines 17-18): "The cAMP reduction and increase respectively started at 35 seconds and 4 min 40 seconds after stimulation on average."

      Pg 28 - Phylogenetic analysis: Given that the results may be of interest to metamorphosis in other marine invertebrates as discussed in the last paragraph of the paper, it would be useful to include G proteins from these other animal phyla where available in the phylogenetic tree. Similarly, in Figure S5A it would be useful to highlight further all the different Ciona G proteins, and the different protein families, through the use of additional colour/labelling (regardless of whether this remains Fig S5A, or becomes part of the main figures)

      We drew a phylogenetic tree of G-proteins including those in some sessile and benthic animals (barnacle, sea anemone, hydra, sponge, sea urchin and shell). However, we decided not to add the tree in the revised version because, unfortunately, the bootstrap values of many branches were not high enough to have confidence in the results. We hope you understand our decision. Ciona divergent G-proteins are likely to be specific to Ciona.

      According to your comment, we highlighted all Ciona G alpha proteins in red in Figure S5A, which is now Figure S1A in the revised version.

      Figure 3E and Figure S3 - is the data shown as an average of all larvae measured (n=5 and n=4) or is it data from one representative larva out of the 4-5 measured? This needs clarification.

      The original graphs in Figure 3E and Figure S3 are typical examples. We added the graphs summarizing data of all larvae in each experimental condition in Supplementary Figure S4 (corresponding to Supplementary Figure S3 of the original version). Figure 3E remains as a typical example of the result of a single larva to explain our data analysis in detail.

      Experimental suggestion - As mentioned above, one missing detail seems to be the need for evidence that cAMP is elevated in the papillae directly as a result of Gs activation- this could be shown with measurement of cAMP via PF in Gs knockdown larvae that are mechanically stimulated compared to wildtype stimulated and non-stimulated?

      Thank you for your suggestion. The experiments are indeed important. We added the data of Pink Flamindo imaging in the Gas, Gaq and dvGai_Chr2 knockdown conditions. The results of Gas and Gaq knockdowns are described in page 11, line 24 to page 12, line 5, and are displayed in Supplementary Figure S4C-D. The result of dvGai_Chr2 knockdown is given on page 16, lines 20-22 and shown in Figure 6E.

      In order to insert the data of cAMP imaging of dvGai_Chr2 knockdown larvae, we transferred some panels of Figure 6 to Supplementary Figure S6. In addition, the knockdown data of dvGαi_Chr4 and double knockdowns of Gai genes are also included in Supplementary Figure S6.

      Reviewer #2:

      Page 6, line 3-4 in the first paragraph of the "Results"; the authors state "Neither morphant showed any signature of metamorphosis even though both were allowed to adhere to the base of culture dishes...". However, judging from Fig. 1E, "the percentage of metamorphosis initiation" (indicated by the initiation of tail regression) in Gαq morphans is not close to 0 (average about 40%), thus I am not convinced this observation can be described as "Neither morphant showed any signature of metamorphosis..." in this sentence.

      Thank you for your suggestion. In writing the original text, we oversimplified some of the descriptions when trying to improve the readability. We agree this resulted in imprecision in places. We have revised all these passages in our revision. In this particular case, we softened the overly emphatic statement to better reflect the results, changing “... any signature of metamorphosis...” to “... reduced rate of metamorphosis initiation...” In addition, we stated that the effect of G_α_q MO was weaker than that of G_α_s MO on page 8, lines 10-12. The weaker effect of Gaq MO was due to the redundant role of the Gi pathway, which is shown on page 17, lines 10-17, and in Figure 6G-H.

      Similarly, in the next paragraph describing the knockdown of PLCβ1/2/3, PLCβ4, and IP3R genes, the authors appear to neglect there is a weaker effect of the PLCβ4 MO, and simply described the results as "The knockdown larvae of these three genes failed to start metamorphosis". Based on Fig. 1H, about 30% of the PLCβ4 MO-injected animals still initiated tail regeneration. This difference may have some biological meanings and thus should be described more precisely.

      We added the following sentence on page 8, lines 18-19 of the revised version: “The effect of PLCβ4 MO was weaker than those of the other MOs, suggesting that this PLC plays an auxiliary role.”

      Page 7, second paragraph, on the description of GCaMP8 fluorescence and also at the end of Fig. 1O legend, the citation to "Figure S1" is confusing; Fig. S1 is the phylogenetic tree of PLCβ proteins. Is there additional data regarding this Gαq MO plus GCaMP8 mRNA injection experiment?

      Figure S1 of the original version corresponds to Figure S2 of the revised version. To avoid confusion, we deleted this citation from the legend of Figure 1O. By this modification, the sentence stating the repertoire of PLCb and IP3R in Ciona (page 8, lines 15-16) is the only sentence citing Figure S2 in the revised version.

      Page 8, first sentence; The purpose of theophylline treatment is not to prevent larvae from adhesion, thus I would suggest modifying this sentence to: "We treated wild-type larvae with theophylline after tail amputation, and we observed that most theophylline-treated larvae completed tail regression without adhesion (Figure 2D-F)".

      We modified the sentence according to your comment. Thank you for your suggestion.

      Page 9, second paragraph; judging from the data presented in Fig. 3C, I think this description: "when papillae were removed from larvae, theophylline failed to induce metamorphosis" is not accurate, because about ~30% of the Papilla cut +Theophylline-treated larvae still initiated their tail regression. This needs to be explained clearly.

      We modified the sentence (page 11, lines 2-3) as follows: “...the average rate of metamorphosis induction by theophylline was reduced from 100% to 30%...”

      Similarly in the next few sentences regarding the results presented in Fig, 3D, the effects of overexpressing those genes are not uniform. While amputation of papillae in larvae overexpressing caPLCβ1/2/3 could inhibit metamorphosis almost completely, papilla cut seems to have a weaker effect on caGαq, caGαs, and bPAC-overexpressing larvae.

      We added a description explaining that caPLCβ1/2/3 was the most sensitive to papilla amputation, and the possibility that PLCβ1/2/3 works specifically in the papillae (page 11, lines 9-11): “Among these experiments, caPLCβ1/2/3 overexpression was the most sensitive to papilla amputation, suggesting that PLCβ1/2/3 acts specifically in the papillae during metamorphosis.”

      Page 9, the paragraph on using the fluorescent cAMP indicator; there is a discrepancy between the described developmental time when the authors conducted this experiment and the metamorphosis competent timing (after 24hpf) described on page 7. On page 26, the authors describe "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass bottom dishes at 20-21 hpf...". Did the authors start stimulating the larvae to observe the fluorescent signal soon after immobilization, or wait several hours until the larvae passed 24hpf and then conduct the experiment?

      The latter is the case. The immobilized larvae were kept until they acquired the competence for metamorphosis and then stimulation/recording was carried out. This point is described in the Materials and Methods section of the revised version (page 29, lines 16-18):

      "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass-bottom dishes at 20-21 hpf, and stimulated their adhesive papillae around 25 hpf."

      Page 10, the description "...Gαq morphants initiated metamorphosis when caGαs was overexpressed in the nervous system (Figure 4F)". It should be noted that the result is only a partial rescue. To be precise, this description needs to be modified.

      We changed the sentence to reflect the results more precisely (page 14, lines 2-3): “Moreover, caGαs overexpression in the nervous system significantly, although not perfectly, ameliorated the effect of Gαq MO (Figure 4F).”

      Page 12-13, This description and the figure 5E presented is a bit confusing to me. The figure legend for 5E: "GABA is necessary for Ca2+ transient in the adhesive papillae (arrow)" But the arrow in this image points to a place with no fluorescent signal, and on the upper corner it labeled as "29% (n=17)". Does that mean the proportion of "no Ca2+ increase after stimulation" was 29% among the 17 samples examined? Or actually, is the other way around that 81% of the examined larvae did not show Ca2+ signal increase after stimulation?

      The latter is the case. We added a caption explaining this clearly in the Figure legend: “The percentage and number exhibit the rate of animals showing Ca<sup>2+</sup> transient in the papillae.”

      Page 13, second paragraph; I do not agree with the overly simplified description that "GABA significantly ameliorated the metamorphosis-failed phenocopies of Gαq, PLCβ, and Gαs morphants". As shown in Fig. 5F-H, adding GABA exerts different levels of partial rescue effect on each morphant, and thus should be described clearly.

      When the outliers are neglected, the effect of GABA is most evident in Gαs knockdowns. This suggests that the target(s) of GABA signaling is more likely to be Gq pathway components. We added the following sentence to the revised version (page 15, lines 14-16):

      “Among the three morphants, GABA exhibited the most effective rescues in Gαs knockdowns than Gαq and PLCβ.”

      In addition, we think this sentence establishes a more logical connection with the sentence that follows it: “These results could be explained by assuming enhancement of the Gq pathway by GABA through PLCβ and another GABA-mediated metamorphic pathway bypassing Gq components.” Thank you for your suggestion.

      The section "Contribution of Gi to metamorphosis" confirmed the possibility that GABA signaling targets Gq pathway components.

      Page 13, the first paragraph on "Contribution of Gi to metamorphosis"; the description that "The knockdown of this gene (Gαi) exhibited a significantly reduced rate of metamorphosis;..." is misleading. I would suggest modifying the entire sentence as "The knockdown of this gene (Gαi) exhibited a moderate (although statistically significant) reduction of metamorphosis rate, suggesting the presence of another Gαi regulating metamorphosis".

      Thank you for your suggestion. We modified the sentence (page 16, lines 2-4 in the revised version) as recommended. We believe the description is much improved.

      Page 20, the last sentence about Ciona papilla neurons expressing transcription factor Islet; the authors seem to attempt to make some comparison with the vertebrate pancreatic beta cells in this paragraph, but the comparison and the argument are not fully developed in this current format.

      To deepen this discussion, we added the following sentence (page 23, lines 10-12): “The atypical secretion of GABA might depend on the transcription factor like Islet shared between Ciona papilla neurons and vertebrate beta cells.”

      However, we would like to limit the depth of our discussion on this point, as we hope to expand on it further in future studies.

      Other suggestions:

      Page 3, second paragraph: as they become unable to "move" after metamorphosis -> "relocate"

      We corrected the word as suggested.

      Page 4, second paragraph: In the first sentence, the author states the current understanding of chordate phylogeny and cites Delsuc et al. 2006 Nature paper at the end of this sentence. However, in this paper cephalochordates were erroneously grouped with echinoderms, and thus chordates did not form a monophyletic clade. A later paper by Bourlat et al, (Nature 444:85-88, 2006) corrected this problem, and subsequently Dulsuc et al. also published another paper (genesis, 46:592-604, 2008) with broader sampling to overcome this problem. These later publications need to be included for the sake of correctness.

      We added this reference.

      Page 14, regarding the redundant function of the typical Gαi protein in the papillae; the authors may try double KD of Gαi and dvGαi_Chr2 in their experimental system to test this idea.

      We carried out double knockdown of typical Gai and dvGαi_Chr2. However, we could not address their redundant role sufficiently because most of the double knockdown larvae exhibited severe shape malformation.

      dvGαi_Chr4 is also expressed in the papillae. We carried out knockdown of this gene, to find that the knockdown resulted in very minor but statistically significant reduction of the metamorphosis rate, suggesting that this Gai also plays a supportive role in metamorphosis. We also carried out double knockdown of dvGαi_Chr2 and dvGαi_Chr4. The double KD larvae exhibited responsiveness to GABA, probably because of the presence of typical Gai.

      These results are described on page 16, lines 2-18, and the data are shown in Supplementary Figure S6A-D of the revised version.

      Responses to the Reviewing editor's comments:

      "Larvae of the ascidian Ciona initiate metamorphosis tens of minutes after adhesion to a substratum via its adhesive organ." - Larvae is plural so change to 'via their adhesive organ'

      The sentence was corrected as suggested.

      "Metamorphosis is a widespread feature of animal development that allows them" - revise the sentence, e.g. "Metamorphosis is a widespread feature of development that allows animals"

      The sentence was corrected as suggested.

      "GABA synthase (GAD)" GAD is not called GABA synthase but glutamate decarboxylase - clarify, e.g. encoding the enzyme synthesizing GABA called glutamate decarboxylase (GAD)

      This part was corrected exactly as suggested. Thank you.

      "IP3 is received by its receptor on the endoplasmic reticulum (ER) and releases calcium ion (Ca2+ )" revise to "IP3 is received by its receptor on the endoplasmic reticulum (ER) that releases calcium ion (Ca2+ )"

      The sentence was corrected as suggested.

      "Moreover, GPCR is implicated as the mediator of settlement" - GPCRs are implicated

      This sentence was modified as suggested.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for providing valuable comments and suggestions for improving the manuscript.

      Response to reviewer comments:

      Reviewer-1

      Comment 1: Major concern is the study lacks rigor in several areas where n=2, results are not quantified with statistics. They need to run power analysis and increase their samples sizes. Please include statistics on all measurements. Filamentous actin staining and alpha-sma is used to visualize mechanosensing but also in other cell activities such as cell contractility for movement, cell to substrate adhesion, cell division, etc. They need to query more mechanosensing related pathways (Piezo1/2, Yap/taz-Hippo, integrin-Focal Adhesion Kinase, etc) to show that mechanosensing changed.

      Response: We have increased the sample size to a minimum of n=3 in most cases. However, a few experiments will require more time to increase sample size, as mentioned below.

      Our data emphasized the role of Rac1 and SRF. We understand that other molecular players may also be involved in sensing or responding to mechanical forces, but surveying multiple families of candidates without a specific hypothesis or functional experiment is beyond the scope of this study.

      __Comment 2: __Fig. 1: In panel E, the cranial bone area measurement is not normalized to mitigate the possible effect of individual differences.

      Response: We have re-quantified the data with normalization to the length of the skull.

      __Comment 3: __In Fig. 2 the authors mentioned many phenotypical changes (bone length changes, gap thickness change, apex thickness change, etc.) based on histology stain, none of them are quantified to show a significant difference between Rac1-WT and Rac1-KO.

      Response: In Fig. 2A, we present the gross morphology of the Rac1-KO embryos and only discuss the tissue defects like edema, hematoma, and hypoplasia, confirmed through H&E as shown in Fig. 2C. We also show the apical limits of the intact calvaria in Fig. 2D, consistent with the calvaria defects observed at birth. In fact, we do not discuss any “bone length changes, gap thickness, or apex thickness change” in this section as suggested by the reviewer. To address the request for more quantification we have added measurement of the edematous area of the apical mesenchyme at E14.5 (Fig. 2C), and this is now shown in Suppl. Fig. 1E. We also added quantification of embryo genotypes and Chi-square tests, now shown in Suppl. Fig. 1D.

      Comment 4: Fig. 2 In panel D, with only 2 embryos per group is not enough for quantitation

      Response: We plan to increase the number of embryos during the revision period.

      Comment 5: Fig. 2 In panel D, the two arrows in the Rac1-KO mutants are not easy to catch.

      Response: We made the arrows bigger and bolder.

      Comment 6: Fig. 3 The thickness quantification is not performed.

      Response: We added quantification in Fig. 3D.

      Comment 7: Fig. 3 The images show an obvious curve change of the apex between the control and mutant. Such change is not discussed in the results. Is it due to histology issue?

      Response: We do not think it is due to technical issues but reflects a real change in the shape of the apex of the head. We modified the graphical representation in Figure 3E to reflect this change in curvature. We also added the following sentence to the results on page 7: “We also noted a loss of curvature in the apex of the Rac1-KO head at E13.5, which correlated with loss of aSMA+ mesenchymal cells and thinning of the EMM (Fig. 3E).”

      __Comment 8: __The merged layer did not show S100a6. While the authors are showing apical expansion of the mesenchyme toward the dermis and meninges, it is hard to track where they are without a merged image.

      Response: We added merged images.

      Comment 9: Fig. 4 In panel B, 2 biological replicates per genotype are very low.

      __Response: __The effect of Rac1-KO on cell cycle is already known (Moore et al. 1997; Nikolova et al. 2007; Gahankari et al. 2021), and our result is supported by in vivo quantification of Tom+Edu+ cells in different regions of the embryonic head shown in Fig. 4A. We prefer not to repeat this assay.

      Comment 10: Fig. 4 There is no cell death data.

      Response: We will generate data on cell death during the revision period.

      __Comment 11: __Fig. 5 In panel B, the GAPDH western plot bands in the mutants seem to be thinner than those of controls.

      Response: We verified equal loading with a Ponceau stain, so this minor change in the GAPDH level could be due to biological differences in the protein level. Nevertheless, by our estimation this minor difference does not explain away the major difference in Rac1 and Srf levels.

      __Comment 12: __Though the immunostain showed a decrease in signal intensity, it is hard to know whether the decrease is significant enough across all Rac1-KO mutants. They need to measure the fluorescence intensity and perform statistics.

      Response: We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the revision period.

      Comment 13: Fig. 6: Similar as Fig. 2, there is no quantification and n=1 per genotype is not enough

      Response: During the revision period we will increase the number of E12.5 Srf-KO and Srf-WT embryos to n=3 for Figure 6G. All other panels currently have n=7 or greater.

      Comment 14: Fig. 7: Need quantification between Srf-KO and Rac1-KO with statistics to show they are not different, but both significantly different from WTs

      Response: In Figure 7D we have added quantification of aSMA area in Srf-KO and Rac1-KO. These results show that both mutants have a similar phenotype with reduced aSMA expression compared to their respective WT littermates, which supports the conclusion that they work in the same pathway. We do not agree with the reviewer that the two mutants should show no statistical difference, because Rac1 and Srf are different genes with overlapping but also non-overlapping functions. During the revision period we will add more Srf-KO embryos and repeat the statistical analysis.

      Comment 15: Supplement Fig.2: No image showing the time point before E11.5.

      Response: We will add an E10.5 time point during the revision period.

      Comment 16: Supplement Fig.3: The ventral view of Rac1-WT does not have the same angle as it shows in Rac1-KO. Makes harder to see the difference between control and mutant.

      Response: We adjusted the brightness/contrast to make the difference clearer.

      Comment 17: Supplement Fig.4 &7: The alkaline phosphatase stained area needs to be normalized to some other metric because the embryos could be different size.

      Response: We normalized to the width of the eye and is now represented in Suppl. Fig. 4 and 7.

      Comment 18: Supplement Fig 6 A: The legend and figure don't match. Is it E13.5 or 14.5. Panel 6B needs better images without curling of the tissue.

      Response: This has been fixed. The immunostaining images in Suppl. Fig. 6A is E14.5. Panel B is now replaced with better images in the revised manuscript.


      Reviewer-2

      __Comment 1.1: __In Fig. 5, links between Rac1, SRF, αSMA, and contractility in mesenchymal cells are shown. Molecular analyses (Western blot and qPCR) were performed using primary cultured mesenchymal cells (prepared after freed from the epidermal population). Although use of cells prepared from E18.5 embryos may have been chosen by the authors for the safe isolation of the mesenchymal population without contamination of epidermal cells, this reviewer finds that anti-SRF immunoreactivity is weaker at E13.5 than at E12.5 (throughout the section including the mesencephalic wall) and therefore wonder whether SRF expression changes in a stage-dependent manner. So, simply borrowing results obtained from E18.5-derived cells for describing the scenario around E12.5 and E13.5 is a little disappointing point found only here in this study.

      Response: In fact, the reason we chose E18.5 was to get enough cells to do the experiments in Figure 5A-D without extensive passaging and/or immortalization, which would undoubtedly cause the cells to deviate from their in vivo character as they become adapted to growing on plastic with 10% serum. Therefore, we prefer not to change the cells as suggested by the reviewer.

      __Comment 1.2: __In Fig. 5F, it is difficult to clearly see "reduction" of SRF immunoreactivity in Rac1-KO. Therefore, quantification of %SRF+/totalTomato+ would be desired.

      Response: __We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the __revision period.

      __Comment 1.3: __Separately, direct comparison of spontaneous centripetal shrinkage of the apical/dorsal scalp tissues, which will occur in 30 min when prepared at E12.5 or E13.5 (Tsujikawa et al., 2022), between WT and Rac1-KO would strengthen the results in Fig. 5D. As KO is specific to the mesenchyme, the authors do not have to worry about removal of the epidermal layer (which would be much more difficult at E12.5-13.5 than E18.5). If the degree of centripetal shrinkage of the "epidermis plus mesenchyme" layers were smaller in Rac1-KO, it would be interpreted to be mainly due to poorer recoiling activity and contractility of the Rac1-KO mesenchymal tissue.

      Response: __We will try to perform the centripetal shrinkage assays as shown by Tsujikawa et al., during the __revision period.

      Comment 2: The authors favor "apical" vs. "basolateral" to tell the relative positions in the embryonic head, not only in the adult head. But "apical" vs. "basolateral" should be accompanied with dorsal vs. ventral at least at the first appearance. Apical-to-basal axis or apex vs. basolateral by itself can provide, in many contexts, impressions that epithelial layers/cells are being discussed. Please note that the authors also use "caudal" (in the embryonic head). Usually, a universally defined anatomical axis perpendicular to the rostral-to-caudal axis is the dorsal-to-ventral axis.

      Response: Apologies for confusing terminology. The terminology is now defined uniformly according to the anatomical axis.

      Comment 3: One of the authors' statements in ABSTRACT "In control embryos, α-smooth muscle actin (αSMA) expression was spatially restricted to the apical mesenchyme, suggesting a mechanical interaction between the growing brain and the overlying mesenchyme" and a similar one in RESULTS "αSMA was not detected in the basolateral mesenchyme of either genotype from E12.5-E14.5 (Suppl. Fig. 4A), suggesting restriction of the mechanosensitive cell state to the apical mesenchyme" need to be at least partly revised, taking previous publication about the normal αSMA pattern in the embryonic head into account more carefully. Tsujikawa et al. (2022) described "Low-magnification observations showed superficial immunoreactivity for alpha smooth muscle actin (αSMA), which has been suggested to function in cells playing force-generating and/or constricting roles; this immunoreactivity was continuously strong throughout the dorsal (calvarial) side of the head but not ventrally toward the face, producing a staining pattern similar to a cap (Figure 2A)" . Therefore, in this new paper, descriptions like "we observed ...., consistent with ....(2022)" or "we confirmed .... (2022)" would be more accurate and appropriate regarding this specific point. Such a minor change does not reduce this study's overall novelty at all.

      Response: Thank you for the correction. We have replaced the terminology and cited the article (Tsujikawa et al., 2022) appropriately, crediting their finding.

      Comment 4: It would be very helpful if the authors provide a schematic illustration in which physiological and pathological scenarios (at the molecular, cellular, and tissue levels found or suggested by this study) are shown.

      Response: We have added a schematic representation of the molecular changes happening in the apical head development because of Rac1- and Srf-KO, and it is represented in Suppl. Fig. 7C.


      Comment 5: Despite being put in the title, "mechanosensing" by mesenchymal cells is not directly assessed in this study. If appropriate, something like "mechano-functioning" would be closer to what the authors demonstrated.

      __Response: __We changed the title to refer to “mechano-responsive mesenchyme”. We think this is appropriate because the cells of interest have reduced aSMA and reduced proliferation, both of which are known to occur, at least in part, as responses to mechanical inputs.

      Reviewer-3

      Comment 1: Prrx1-Cre targets calvarial mesenchyme and Suzuki et al., 2009 showed that Prrx1-Cre mediated loss of Rac1 lead to calvarial bone phenotype due to incomplete fusion of the skull. While this phenotype was not studied in detail, the statement in the intro and discussion that the calvarial phenotype has not been recapitulated in mice is incorrect.

      Response: Suzuki et al showed incomplete fusion of the skull. Although the skull is a tissue that is affected in AOS, it is not akin to the scalp and calvaria aplasia that typifies AOS. Our result stands apart from this. We clarified our position as such:

      Introduction (page 4): “Nevertheless, the calvaria phenotype seen in AOS individuals has not been explored in detail or fully recapitulated in mice.”

      Discussion (page 11): Previous studies have demonstrated the role of Rac1 in mesenchyme-derived tissues, but they did not recapitulate AOS phenotypes.”

      Comment 2: The authors show that Pdgfra-Cre induced knockout of Rac1 leads to lower-than-expected numbers of Rac1-cKO embryos at E18.5 and P1. Phenotypic analysis shows that the earliest phenotype is blebbing and hematoma in the nasal region at E11.5/12.5. It is stated that this was resolved at E18.5. It is unclear if this is truly a resolution of the phenotype or that these embryos fail to survive until E18.5. Do 100% of the Rac1-cKO embryos exhibit the blebbing/hematoma at E11.5/12.5? What is the observed number/percentage of Rac1-cKO embryos at E11.5/12.5? If the observed percentage of Rac1-cKO is similar to that at E18.5 (lower than the expected 25%), this would support resolution. If the observed ratio is as expected at E11.5/12.5, then this would support embryonic loss before E18.5 rather than phenotypic resolution.

      Response: Please note that 100% (n=12) of E12.5 Rac1-KO embryos displayed nasal and mild caudal edema as exhibited in Fig. 2A, but none (n=16) had blebbing/hematoma by E18.5. We added tables for the number of embryos recovered at E12.5 and E18.5 to Supplemental Figure 1. These results show that the percentage of mutants at E12.5 was 21.42%, not significantly different from the expected frequency (p = 0.5371). At E18.5, the percentage dropped slightly to 18.3%, but still not significantly different from expected (p = 0.1545). The significant change in frequency of blebbing/hematoma from E12.5 to E18.5, without any significant change in the frequency of mutants, supports phenotypic resolution of the early blebbing/hematoma.

      Comment 3: It is stated that brain shape is altered in Rac1-cKO embryos at E14.5 and E18.5 and concluded that these shape differences are secondary to the cranial defects. Pdgfra+ cells gives rise to the meninges and if the Pdgfra-Cre line recapitulates this expression, then loss of the ubiquitously expressed Rac1 in the meninges could lead to a primary defect in the brain, which may lead to secondary defects in the calvarium and scalp. Their conclusion should recognize other possibilities.

      Response: We agree it is possible that there are meninges defects that secondarily change the shape of the brain, and we added a mention of this possibility. It is highly unlikely that scalp defects are only secondary to brain changes because the first observable phenotypes are in the EMM that gives rise to the scalp.

      Comment 4: The TdTom staining in wholemount at E13.5 (Supplemental Figure 2B) is difficult to appreciate in the image shown.

      Response: At E11.5 there is good contrast between labeled cranial structures and non-labeled body. At E13.5, Tomato appears in most of the mesenchymal cells in the embryo, so there is not as much contrast. The lack of contrast at E13.5 may cause the reviewer think there is something wrong with the image.

      Comment 5: The idea that the EMM laminates into the meninges and scalp layers is not new and should be properly cited (Vu et al., 2021, Scientific Reports). The following paper should also be cited on the use of alpha-SMA (Acta2) as a marker of the anterior calvaria mesenchyme: Holms et al., 2020 Cell Reports.

      Response: Thank you. We are happy to add those citations.

      Comment 6: It is concluded that meningeal development is maintained in the cKO; however, this conclusion was based on a single marker (S100a6) that is both expressed in the presumptive meninges and dermis and greatly reduced overall in the cKO. This conclusion should be softened or other markers used to show that the meninges is indeed normal.

      Response: We softened the conclusion on the meninges in the revised manuscript, as this part of the phenotype is was not our focus but it would be a good thing to look at in the future.

      Comment 7: The overlap of S100a6 and alpha-SMA is difficult to appreciate in the images shown in Figure 3. Since this is important to the conclusion, co-staining should be done. If co-staining cannot be done due to the primary antibodies' origins, then ISH should be done.

      Response: We added merged images.

      Comment 8: It is concluded that reduced alpha-SMA suggests an early failure of Rac-cKO cells to respond to the mechanical environment. While this is one possibility, the reduction of alpha-SMA may simply be due to a reduction of these cells resulting from failed differentiation, decreased proliferation, or increased apoptosis.

      Response: We think the fact that aSMA is downregulated in cultured cells strongly argues against it being a trivial consequence of reduce proliferation etc. Nevertheless, we softened our conclusion to allow for some of these things to also contribute to the reduced aSMA expression. We will check apoptosis during the revision period.

      Comment 9: The conclusion that alpha-SMA is a transient population only present in apical cranial mesenchyme between E12.5-14.5 is not consistent with prior studies: Holms et al., 2020 Cell Reports; Holms et al., 2021 Nature Communications; Farmer et al., 2021 Nature Communications; Takeshita et al., 2016 JBMR.

      Response: There is no contradiction. Our statements are based on antibody staining where it is very evident that a-SMA-expressing cells are detectable throughout the apical mesenchyme between E12.5 and E14.5. But at E18.5 we do not see this kind of broad aSMA expression the apical head, suggesting a transient and spatially restricted population of cells in the apical mesenchyme. This is consistent with the studies from Tsujikawa et al., 2022 and Angelozzi et al., 2022. The papers mentioned by the reviewer are only focused on the suture mesenchyme. They do not claim there is broad aSMA/Acta2 expression in the apical head, but only in a spatially restricted subpopulation of suture mesenchymal cells.

      Comment 10: In the SRF immunostaining results in control and Rac1-cKO embryos, it is difficult to appreciate the nuclear localization at E12.5 in Figure 5E, as the DAPI is over saturated, and the image quality is poor. The image quality is also poor in Figure 5F.

      Response: We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the revision period.

      Comment 11: To what extent is the expression/localization of MRTF, the transcriptional co-activator of SRF, altered in the calvarial mesenchyme of Rac1-cKO embryos? Changes in MRTF would strengthen the link between Rac1 and SRF.

      Response: We do not know how MRTF expression/localization changes in the embryo tissue, but western blot data on Rac1-KO fibroblasts revealed a reduction in expression/nuclear localization of MRTF-A/B that mirrored the changes in SRF. We added these blots to Figure 5A. However, as noted at the end of the discussion, MRTF is not always required for SRF function in vivo ( Dinsmore, Elife 2022). The MRTFA/B-KO is a possibility for future work.

      Comment 12: Hypoplasia of the apical mesenchyme (Figure 6G, inset 1) in Srf-cKO is difficult to see.

      Response: During the revision period we will increase the number of E12.5 Srf-KO and Srf-WT embryos to n=3 for Figure 6G and replace the picture with a better one.

      Comment 13: Generally, the organization of the data into many main and supplemental Figures makes the flow difficult to follow.

      __Response____: __We understand the concern, but we have tried our best to organize the most important data into main figures and the relevant but less essential data into supplemental figures.

      Comment 14: SFR interacts with Pdgfra interacts genetically with Srf in neural crest cells in craniofacial development, with Srf being a target of PDGFRa signaling (Vasudevan and Soriano, 2015, Dev Cell). Since the Pdgfra-Cre line used here is hemizygous, is important that the control used to look at SRF expression in the Rac1-cKO is Pdgfra-Cre+.

      Response: It is standard practice to include some Cre+ mice in the control set to reveal whether Cre has toxic effects in the cells of interest. To the reviewer’s concern about genetic interactions between the Pdgfra gene and Srf, this should not be relevant here because the Pdgfra-Cre used in our study is a transgene and does not affect the endogenous Pdgfra gene.

      Comment 15: The text size in all figures is too small and varies throughout, making it difficult to read.

      Response: To fit the panel in the Word document, the figure is resized. This should not be an issue in the final manuscript.

      Comment 16: Details about the pulse-chase timing of the EdU experiments should be included in the results. Also, does n = 3 for each stage and each genotype? I would be helpful to include a representative section for a control and cKO littermate pair.

      Response: The details are now included in the methods section. Yes, n=3 in each stage and genotype (Fig. 4A). The representative images are also included.

      Comment 17: The relative sizing of the panels within and between figures is haphazard. Some are very large and others very small (Figure 2, 6, Supplemental Figure 1, 2, 6, 7).

      Response: The image panels are fixed in the revised manuscript.

      Comment 18: In Figure 5A and F, the titles "E12.5" and "E13.5" are in italics.

      Response: The fonts for the figures are fixed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewers' thoughtful comments and suggestions. Below, we provide point-by-point responses to the recommendations and outline the updates made to the manuscript.

      (1) Discussion, "the obvious experiment is to manipulate a neuron's anatomical embedding while leaving stimulus information intact."] The epiphenomenon can arise from the placement and types of a neuron's neurotransmitters and neuromodulators, too.

      The content of vesicles released by a neuron is obviously of great importance in determining postsynaptic impact. However, we’re suggesting that (assuming vesicular content is held constant) the anatomically-relevant patterning of spiking might additionally affect the postsynaptic neuron’s integration of the presynaptic input. To avoid confusion, we updated the text accordingly: “the obvious experiment is to manipulate a neuron's anatomical embedding while minimally impacting external and internal variables, such as stimulus information and levels of neurotransmitters or neuromodulators” (Line 594 - 596).

      (2) “In all conditions, the slope of the input duration versus sensitivity line was still positive at 1,800 seconds (Fig. 3B)". This may suggest that the estimate of the calculated statistics (ISI, PSTH) is more reliable with more data, rather than (or in addition to) specific information being extracted from faraway time points. Another potential confound is the training statistics were calculated from all training data, so the test data is a better match to training data when test statistics are calculated from more data. Overall, the validity of the conclusions following this observation is not clear to me.

      This is a great point. Accordingly, we revised the text to include this possibility: “Because the training data were of similar duration, this could be explained by either of two possibilities. First, the signal is relatively short, but noisy—in this case, extended sampling will increase reliability. Second, the anatomical signal is, itself, distributed over time scales of tens to hundreds of seconds.” (Line 252 - 255).

      (3) "This further suggests that there is a latent neural code for anatomical location embedded within the spike train, a feature that could be practically applied to determining the brain region of a recording electrode without the need for post-hoc histology". The performance of the model at the subregion level, which is a typical level of desired precision in locating cells, does not seem to support such a practical application. Please clarify to avoid confusion.

      The current model should not be considered a replacement for traditional methods, such as histology. Our intention is to convey that, with the inclusion of multimodal data and additional samples, a computational approach to anatomical localization has great promise. We updated the manuscript to clarify this point: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Additionally, we directly addressed this point in our original manuscript (Discussion section: Line 498 - 505 in the current version). Furthermore, following the release of our preprint, independent efforts have adopted a multimodal strategy with qualitatively similar results (Yu et al., 2024). Other recent work expands on the idea of utilizing single-neuron features for brain region/structure characterization (La Merre et al., 2024).

      Yu, H., Lyu, H., Xu, E. Y., Windolf, C., Lee, E. K., Yang, F., ... & Hurwitz, C. (2024). In vivo cell-type and brain region classification via multimodal contrastive learning. bioRxiv, 2024-11.

      Le Merre, P., Heining, K., Slashcheva, M., Jung, F., Moysiadou, E., Guyon, N., ... & Carlén, M. (2024). A Prefrontal Cortex Map based on Single Neuron Activity. bioRxiv, 2024-11.

      (4) "These results support the notion the meaningful computational division in murine visuocortical regions is at the level of VISp versus secondary areas.". The use of the word "meaningful" is vague and this conclusion is not well justified because it is possible that subregions serve different functional roles without having different spiking statistics.

      Precisely! It is well established that different subregions serve different functional purposes - but they do not necessitate different regional embeddings. It is important to note the difference between stimulus encoding and the embedding that we are describing. As a rough analogy, the regional embedding might be considered a language, while the stimulus is the content of the spoken words. However, to avoid vague words, we revised the sentence to “These results suggest that the computational differentiability of murine visuocortical regions is at the level of VISp versus secondary areas.” (Line 380 - 381)

      (5) Figure 3D left/right halves look similar. A measure of the effect size needs to accompany these p-values.

      We assume the reviewer is referring to Figure 3E. Although some of the violin plots in Figure 3E look similar, they are not identical. In the revision, we include effect sizes in the caption.

      (6) Figure 3A, 3F: Could uncertainty estimates be provided?

      Yes. We added uncertainty estimates to the text (Line 272 - 294) and to the caption of Figure S2, which displays confusion matrices corresponding to Figure 3A. The inclusion of similar estimates for 3F would be so unwieldy as to be a disservice to the reader—there are 240 unique combinations of stimulus parameters and structures. In the context of the larger figure, 3F serves to illustrate a relationship between stimulus, region, and the anatomical embedding.

      (7) Page 21. "semi-orthogonal". Please reword or explain if this usage is technical.

      We replaced “semi-orthogonal” with “dissociable” (Line 549).

      (8) Page 11, "This approach tested whether..."] Unclear sentence. Please reword.

      We changed “This approach tested whether the MLP’s performance depended on viewing the entire ISI distribution or was enriched in a subset of patterns” to “This approach identified regions of the ISI distribution informative for classification” (Line 261).

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments and summary of the results. We agree that the introductory results (Figs. 1-3) are not particularly compelling when considered in isolation. They provide a baseline of comparison for the subsequent results. Our intention was to approach the problem systematically, progressing from well-established, basic methods to more advanced approaches. This allows us to clearly test a baseline and avoid analytical leaps or untested assumptions. Specifically:

      ● Figure 1 provides an evaluation of the standard dimensionality reduction methods. As expected, these methods yield minimal results, serving as a clear baseline. This is consistent, for example, with an understanding of single units as rate-varying Poisson processes.

      ● Figures 2 and 3 then build upon these results with spiking features frequent in neuroscience literature such as firing rate, coefficient of variation, etc using linear supervised and more detailed spiking features such as ISI distribution using nonlinear supervised machine learning methods.

      By starting from the standpoint of the status quo, we are better able to contextualize the significance of our later findings in Figures 4–6.

      Response to Specific Points in the Summary

      (6) Separability of VISp vs. Secondary Visual Areas

      I found the entire argument about visual areas somewhat messy and unclear. The stimuli used might not drive the secondary visual areas particularly well and might necessitate task engagement.

      We appreciate your feedback that the dissection of visual cortical structures is unclear. To summarize, as shown in the bottom three rows of Figure 6, there is a notable lack of diagonality in visuocortical structures. This means that our model was unable to learn signatures to reliably predict these classes. In contrast, visuocortical layer is returned well above chance, and superstructures (primary and secondary areas) are moderately well identified, albeit still well above chance.

      Consider a thought experiment, if Charlie Gross had not shown faces to monkeys to find IT, or Newsome and others shown motion to find MT and Zeki and others color stimuli to find V4, we would conclude that there are no differences.

      The thought experiment is misleading. The results specifically do not arise from stimulus selectivity—much of Newsome’s own work suggests that the selectivity of neurons in IT etc. is explained by little more than rate varying Poisson processes. In this case, there should be no fundamental anatomical difference in the “language” of the neurons in V4 and IT, only a difference in the inputs driving those neurons. In contrast, our work suggests that the “language” of neurons varies as a function of some anatomical divisions. In other words, in contrast to a Poisson rate code, our results predict that single neuron spike patterns might be remarkably different in MT and IT— and that this is not a function of stimulus selectivity. Notably, the anatomical (and functional) division between V1 and secondary visual areas does not appear to manifest in a different “language”, thus constituting an interesting result in and of itself.

      We regret a failure to communicate this in a tight and compelling fashion on the first submission, but hope that the revision is limpid and accessible.

      Barberini, C. L., Horwitz, G. D., & Newsome, W. T. (2001). A comparison of spiking statistics in motion sensing neurones of flies and monkeys. Motion Vision: Computational, Neural, and Ecological Constraints, 307-320.

      Bair, W., Zohary, E., & Newsome, W. T. (2001). Correlated firing in macaque visual area MT: time scales and relationship to behavior. Journal of Neuroscience, 21(5), 1676-1697.

      Similarly, why would drifting gratings be a good example of a stimulus for the hippocampus, an area thought to be involved in memory/place fields?

      The results suggest that anatomical “language” is not tied to stimuli. It is imperative to recall that neurons are highly active absent experimentally imposed stimuli, such as when an animal is at rest, when an animal is asleep, and when an animal is in the dark (relevant to visual cortices). With this in mind, also recall that, despite the lack of stimuli tailored to the hippocampus, neurons therein were still reliably separable from neurons in seven nuclei in the thalamus, 6 of which are not classically considered visual regions. Should these regions (including hippocampus) have been inert during the presentation of visual stimuli, there would have been very little separability.

      (7) Generalization across laboratories

      “[C]omparison across laboratories was somewhat underwhelming. It does okay but none of the results are particularly compelling in terms of performance.

      Any result above chance is a rejection of the null hypothesis: that a model trained on a set of animals in Laboratory A will be ineffective in identifying brain regions when tested on recordings collected in Laboratory B (in different animals and under different experimental conditions). As an existence proof, the results suggest conserved principles (however modest) that constrain neuronal activity as a function of anatomy. That models fail to achieve high accuracy (in this context) is not surprising (given the limitations of available recordings)---that models achieve anything above chance, however, is.

      Thus, after reading the paper many times, I think part of the problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding.

      We demonstrate that neuronal spike trains carry robust anatomical information. We developed an ML architecture for this and that architecture is publicly available.

      They try to split the middle and I am left somewhat perplexed about what exact scientific problem they or other researchers are solving.

      We humbly suggest that the question of a neurons “language” is highly important and central to an understanding of how brains work. From a computational perspective, there is no reason for a vast diversity of cell types, nor a differentiation of the rules that dictate neuronal activity in one region versus another. A Turing Complete system can be trivially constructed from a small number of simple components, such as an excitatory and inhibitory cell type. This is the basis of many machine learning tools.

      Please do not confuse stimulus specificity with the concept of a neuron’s language. Neurons in VISp might fire more in response to light, while those in auditory cortex respond to sound. This does not mean that these neurons are different - only that their inputs are. Given the lack of a literature describing our main effect—that single neuron spiking carries information about anatomical location—it is difficult to conclude that our results are either commonplace or to be expected.

      I am also unsure why the authors think some of these results are particularly important.

      See above.

      For instance, has anyone ever argued that brain areas do not have different spike patterns?

      Yes. In effect, by two avenues. The first is a lack of any argument otherwise (please do not conflate spike patterns with stimulus tuning), and the second is the preponderance of, e.g., rate codes across many functionally distinct regions and circuits.

      Is that not the premise for all systems neuroscience?

      No. The premise for all systems neuroscience (from our perspective) is that the brain is a) a collection of interacting neurons and b) the collective system of neurons gives rise to behavior, cognition, sensation, and perception. As stated above, these axiomatic first principles fundamentally do not require that neurons, as individual entities, obey different rules in different parts of the brain.

      I could see how one could argue no one has said ISIs matter but the premise that the areas are different is a fundamental part of neuroscience.

      Based on logic and the literature, we fundamentally disagree. Consider: while systems neuroscience operates on the principle that brain regions have specialized functions, there is no a priori reason to assume that these functions must be reflected in different underlying computational rules. The simplest explanation is that a single language of spiking exists across regions, with functional differences arising from processing distinct inputs rather than fundamentally different spiking rules. For example, an identical spike train in the amygdala and Layer 5 of M1 would have profoundly different functional impacts, yet the spike timing itself could be identical (even as stimulus response). Until now, evidence for region-specific spiking patterns has been lacking, and our work attempts to begin addressing this gap. There is extensive further work to be conducted in this space, and it is certain that models will improve, rules will be clarified, and mechanisms will be identified.

      Detailed major comments

      (1) Exploratory trends in spiking by region and structure across the population:

      The argument in this section is that unsupervised analyses might reveal subtle trends in the organization of spiking patterns by area. The authors show 4 plots from t-SNE and claim to see subtle organization. I have concerns. For Figure 1C, it is nearly impossible to see if a significant structure exists that differentiates regions and structures. So this leads certain readers to conclude that the authors are looking at the artifactual structure (see Chari et al. 2024) - likely to contribute to large Twitter battles. Contributing to this issue is that the hyperparameter for tSNE was incorrectly chosen. I do think that a different perplexity should be used for the visualization in order to better show the underlying structure; the current visualization just looks like a single "blob". The UMAP visualizations in the supplement make this point more clearly. I also think the authors should include a better plot with appropriate perplexity or not include this at all. The color map of subtle shades of green and yellow is hard to see as well in both Figure S1 and Figure 1.

      In response to the feedback, we replaced t-SNE/UMAP with LDA, while keeping PCA for dimensionality reduction.

      As stated in the original methods, t-SNE/UMAP hyperparameters were chosen based on the combination that led to the greatest classifiable separability of the regions/structures in the space (across a broad range of possible combinations). It just so happens that the maximally separable structure from a regions/structures perspective is the “blob”. This suggests that perhaps the predominant structure the t-SNE finds in the data is not driven by anatomy. If we selected hyperparameters in some other way that was not based specifically on regions/structures (e.g. simple visual inspection of the plots) the conformation would of course be different and not blob-like. However, we removed the t-SNE and UMAP to avoid further confusion.

      The “muddy appearance” is not an issue with the color map. As seen in Figure 1B, the chosen colors are visibly distinct. Figure 1C (previous version) appeared muddy yellow/green because of points that overlap with transparency, resulting in a mix of clearly defined classes (e.g., a yellow point on top of a blue point creating green). This overlap is a meaningful representation of the separability observed in this analysis. We also tried using 2D KDE for visualization, but it did not improve the impression of visual separability.

      We are removing p-values from the figures because they lead to the impression that we over-interpret these results quantitatively. However, we calculated p-values based on label permutation similar to the way R2 suggests (see previous methods). The conflation with the Wasserstein distances is an understandable misunderstanding. These are unrelated to p-values and used for the heatmaps in S1 only (see previous methods).

      Instead of p-values, we now use the adjusted rand index, which measures how accurately neurons within the same region are clustered together (see Line 670 - 671, Figure 1C, and Figure S1) (Hubert & Arabie 1985). This quantifies the extent to which the distribution of points in dimensionally-reduced space is shaped by region/structure.

      Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

      (2) Logistic classifiers:

      The results in this section are somewhat underwhelming. Accuracy is around 40% and yes above chance but I would be very surprised if someone is worried about separating visual structures from the thalamus. Such coarse brain targeting is not difficult. If the authors want to include this data, I recommend they show it as a control in the ISI distribution section. The entire argument here is that perhaps one should not use derived metrics and a nonlinear classifier on more data is better, which is essentially the thrust of the next section.

      As outlined above, our work systematically increases in model complexity. The logistic result is an intermediate model, and it returns intermediate results. This is an important stepping stone between the lack of a result based on unsupervised linear dimensionality reduction and the performance of supervised nonlinear models.

      From a purely utilitarian perspective, the argument could be framed as “one should not use derived metrics, and a nonlinear classifier on more data is better.” However, please see all of our notes above.

      (3) MLP classifiers:

      Even in this section, I was left somewhat underwhelmed that a nonlinear classifier with large amounts of data outperforms a linear classifier with small amounts of data. I found the analysis of the ISIs and which timescales are driving the classifier interesting but I think the classifier with smoothing is more interesting. So with a modest chance level decodability of different brain areas in the visual system, I found it somewhat grandiose to claim a "conserved" code for anatomy in the brain. If there is conservation, it seems to be at the level of the coarse brain organization, which in my opinion is not particularly compelling.

      The sample size used for both the linear and nonlinear classifiers is the same; however, the nonlinear classifier leverages the detailed spiking time information from ISIs. Our goal here was to systematically evaluate how classical spike metrics compare to more detailed temporal features in their ability to decode brain areas. We chose a linear classifier for spike metrics because, with fewer features, nonlinear methods like neural networks often offer very modest advantages over linear methods, less interpretability, and are prone to overfitting.

      Respectfully, we stand by our word choice. The term “conserved” is appropriate given that our results hold appreciably, i.e., statistically above chance, across animals.

      (4) Generalization section:

      The authors suggest that a classifier learned from one set of data could be used for new data. I was unsure if this was a scientific point or the fact that they could use it as a tool.

      It can be both. We are more driven by the scientific implications of a rejection of the null.

      Is the scientific argument that ISIs are similar across areas even in different tasks?

      It appears so - despite heterogeneity in the tuning of single neurons, their presynaptic inputs, and stimuli, there is identifiable information about anatomical location in the spike train.

      Why would one not learn a classifier from every piece of available data: like LFP bands, ISI distributions, and average firing rates, and use that to predict the brain area as a comparison?

      Because this would obfuscate the ability to conclude that spike trains embed information about anatomy.

      Considering all features simultaneously and adding additional data modalities—such as LFP bands and spike waveforms—has potential to improve classification accuracy at the cost of understanding the contribution of each feature. The spike train as a time series is the most fundamental component of neuronal communication. As a result, this is the only feature of neuronal activity of concern for the present investigation.

      Or is the argument that the ISIs are a conserved code for anatomy? Unfortunately, even in this section, the data are underwhelming.

      We appreciate the reviewer’s comments, but arrive at a very different conclusion. We were quite surprised to find any generalizability whatsoever.

      Moreover, for use as a tool, I think the authors need to seriously consider a control that is either waveforms from different brain areas or the local field potentials. Without that, I am struggling to understand how good this tool is. The authors said "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc)., our studies involve only the timestamps of individual spikes from well-isolated units ". However, we are not talking about information transmission and actually trying to identify and assess brain areas from electrophysiological data.

      While we are not blind to the “tool” potential that is suggested by our work, this is not the primary motivation or content in any section of the paper. As stated clearly in the abstract, our motivation is to ask “whether individual neurons [...] embed information about their own anatomical location within their spike patterns”. We go on to say “This discovery provides new insights into the relationship between brain structure and function, with broad implications for neurodevelopment, multimodal integration, and the interpretation of large-scale neuronal recordings. Immediately, it has potential as a strategy for in-vivo electrode localization.” Crucially, the last point we make is a nod to application. Indeed, our results suggest that in-vivo electrode localization protocols may benefit from the incorporation of such a model.

      In light of the reviewer’s concerns, we have further dampened the weight of statements about our model as a consumer-ready tool.

      Example 1: The final sentence of the abstract now reads: “Computational approximations of anatomy have potential to support in-vivo electrode localization.”

      Example 2: The results sections now contains the following text: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Example 3: We replaced the phrase "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc) " with the phrase “because information is primarily encoded by the firing rate or the timing of spiking and not waveforms (etc)” (Line 116 - 118).

      (5) Discussion section:

      In the discussion, beginning with "It is reasonable to consider . . ." all the way to the penultimate paragraph, I found the argumentation here extremely hard to follow. Furthermore, the parts of the discussion here I did feel I understood, I heavily disagreed with. They state that "recordings are random in their local sampling" which is almost certainly untrue when it comes to electrophysiology which tends to oversample task-modulated excitatory neurons (https://elifesciences.org/articles/69068). I also disagree that "each neuron's connectivity is unique, and vertebrate brains lack 'identified neurons' characteristic of simple organisms. While brains are only eutelic and "nameable" in only the simplest organisms (C. elegans), cell types are exceedingly stereotyped in their connectivity even in mammals and such connectivity defines their computational properties. Thus I don't find the premise the authors state in the next sentence to be undermined ("it seems unlikely that a single neuron's happenstance imprinting of its unique connectivity should generalize across stimuli and animals"). Overall, I found this subsection to rely on false premises and in my opinion it should be removed.

      At the suggestion of R2, we removed the paragraph in question. However, we would like to address some points of disagreement:

      We agree that electrophysiology, along with spike-sorting, quality metrics, and filtering of low-firing neurons, leads to oversampling of task-modulated neurons. However, when we stated that recordings are random in their local sampling, we were referring to structural (anatomical) randomness, not functional randomness. In other words, the recorded neurons were not specifically targeted (see below).

      Electrode arrays, such as Neuropixels, record from hundreds of neurons within a small volume relative to the total number of neurons and the volume of a given brain region. For instance, the paper R2 referenced includes a statement supporting this: “... assuming a 50-μm ‘listening radius’ for the probes (radius of half-cylinder around the probe where the neurons’ spike amplitude is sufficiently above noise to trigger detection) …, the average yield of 116 regular-spiking units/probe (prior to QC filtering) would imply a density of 42,000 neurons/mm³, much lower than the known density of ~90,000 neurons/mm³ for excitatory cells in mouse visual cortex….”

      If we take the estimated volume of V1 to be approximately 3 mm³, this region could theoretically be subdivided into multiple cylinders with a 100-μm diameter. While stereotaxic implantation of the probe mitigates some variability, the natural anatomical variability across individual animals introduces spatially random sampling. This was the randomness we were referring to, and thus, we disagree with the assertion that our claim is “almost certainly untrue.”

      Additionally, each cortical pyramidal neuron is understood to have ~ 10,000 presynaptic partners. It is highly unlikely that these connections are entirely pre-specified, perfectly replicated within the same animal, and identical across all members of species. Further, there is enormous diversity in the activity properties of even neighboring cells of the same type. Consider pyramidal neurons in V1. Single neuron firing rates are log normally distributed, there are many of combinations of tuning properties (i.e., direction, orientation) that must occupy each point in retinotopic space, and there is powerful experience dependent change in the connectivity of these cells. We suggest that it is inconceivable that any two neurons, even within a small region of V1, have identical connectivity.

      Minor Comments:

      (1) Although the description of confusion matrices is good from a didactic perspective, some of this could be moved to methods to simplify the paper.

      We thank the reviewer for the suggestion. However, given the broad readership of eLife, we gently suggest that confusion matrices are not a trivial and universally appreciated plotting format. For the purpose of accessibility, a brief and didactic 2-sentence description will make the paper far more comprehensible to many readers at little cost to experts.

      (2) Figure 3A: It is concluded in their subsequent figure that the longer the measured amount of time, the better the decoding performance. Thus it makes sense why the average PSTHs do not show significant decoding of areas or structures

      That is a good observation. However, all features were calculated from the same duration of data, except in Figure 3B, where we tested the effect of duration. The averaged PSTH was calculated from the same length of data as the ISI distribution and binned to have the same number of feature lengths as the ISI distribution (refer to Methods section). Therefore, we interpreted this as an indication of information degradation through averaging, rather than an effect of data length (Line 234 - 237).

      (3) Figure 3D: A Gaussian is used to fit the ISI distributions here but ISI distributions do not follow a normal distribution, they follow an inverse gamma distribution.

      We agree with the reviewer and we are familiar with the literature that the ISI distribution is best fitted by a gamma family distribution (as a recent, but not earliest example: Li et al. 2018). However, we did not fit a gaussian (or any distribution) to the data, we just calculated the sample mean and variance. Reporting sample mean and variance (or standard deviation) is not something that is only done for Gaussian distributions. They are broadly used metrics that simply have additional intrinsic meaning for Gaussian distributions. We used the schematic illustration in Fig 3D because mean and variance are much more familiar in Gaussian distribution context, but ultimately that does not affect our analyses in Fig 3 E-F. Alternatively, the alpha and beta intrinsic parameters of a gamma distribution could have been used, but they are known by a much smaller portion of neuroscientists.

      Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., Fox, G. E., ... & Tsien, J. Z. (2018). Spike-timing pattern operates as gamma-distribution across cell types, regions and animal species and is essential for naturally-occurring cognitive states. Biorxiv, 145813(10.1101), 145813.

      (4) Figure 3G: Something is wrong with this figure as each vertical bar is supposed to represent a drifting grating onset but yet, they are all at 5 hz despite the PSTH being purportedly shown at many different frequencies from 1 to 15 hz.

      We appreciate your attention to detail, but we are not representing the onset of individual drifting gratings in this. We just meant to represent the overall start\end of the drifting grating session. We did not intend to signal the temporal frequency of the drifting gratings (or the spatial frequency, orientation, or contrast).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      While interesting, this intuition is not correct. The oscillations are generated by the interaction between excitatory and inhibitory nodes in the network and occur in the model even with stationary gain. All of the plots in figure 3 exploring the dynamical regime of the network at different input x gain combinations (i.e., where the oscillatory regime is characterised) are simulations run with stationary gain.

      To ensure that this intuition is more clearly presented in the manuscript, we have edited the description in the text.

      P. 12: “Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead, we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of (static) gain and  visited by the network.”

      Reviewer #2 (Public review):

      - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.

      Thank you for the suggestion, which we considered in detail. Unfortunately, the  temporal and spatial resolution of the fMRI data collected for this study precluded the same analyses we’ve run in previous work, however this is an important question for future work.

      - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.

      We agree that this is a limitation of the current study, which we previously highlighted in the methods section.

      - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.

      We appreciate the critical analysis of the experimental paradigm but disagree with the reviewers conclusions for two keys reasons: 1) Participants prior exposure to the images, such that they could create an expectation about what stimulus category a particular image would transition into (i.e., the image could not switch into any possible category); and 2) even if the reviewers’ concern was founded, models of K winner-take-all decision making are structured identically irrespective of whether the options are 2 or K options all that changes is the simulated reaction times which depend linearly on the K (for an example model see Hugh Wilson’s textbook Spikes, Decisions, and Actions, 1999, p.89-91). For these reasons, we maintain that the RNN is a sensible representation of the behavioural task.

      - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.

      While we agree that the effect is observable with both static and dynamic gain, the stronger construct validity associated with the dynamic approach, including a stronger link with the observed pupil dynamics and a rich literature associated with modelling the behavioural consequences of surprise/uncertainty led us to the conclusion that the dynamical approach was a better representation of our hypothesis.

      - Fig 1C: I don't see a "top grey bar" indicating significance.

      Thank you for catching this, the caption has been amended. The text was from an older version of the manuscript.

      - p. 10, reference to fig 3F seems incorrect: there is Fig 3F upper and Fig 3F lower, and nothing on Fig 3 and its legend mention the lesion of units

      This has been amended. We meant to refer to 2F.

      - In the response letter you mention a MATLAB tutorial, but I could not find it.

      This has been amended. Github repository can be found at https://github.com/ShineLabUSYD/AmbiguousFigures

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

      Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses:

      Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

      Reviewer #3 (Public review):

      In this revised manuscript, the authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Showing the ribosome density profiles of topAI/yjhQP and toiL in control and tetracycline treated cells is necessary to support that ribosome arrest at toiL increases translation of topAI/yjhQP.

      Figure 7B shows ribosome density around the start of toiL. Ribosome density increases across topAI in the presence of tetracycline, but we have opted not to show this region because we cannot say whether the increase in ribosome occupancy (represented in Figure 7A) is due to an increase in translation efficiency, RNA level, or both.

      - The subinhibitory antibiotic concentrations used in the reporter assays were based on MICs reported in the literature. This is not appropriate since MICs can greatly vary between strains, antibiotic solution stocks, and experimental conditions.

      Reported MICs were used as an initial guide for selecting antibiotic concentrations to test in our reporter assays. We have added text to indicate this, and to highlight that MICs vary considerably between strains.

      - toiL sequence may have evolved to maintain base-pairing with the topAI upstream region rather than, as authors suggest in Discussion, to respond to antibiotic-mediated arrest in an amino acid sequence specific manner.

      We have chosen to frame this as speculation.

      - Authors may consider commenting on the possibility that chloramphenicol does not induce because ToiL lacks alanine residues, whose presence at specific places of a nascent protein have been shown to promote chloramphenicol action (2016 PNAS 113:12150; 2022 NSMB 29:152).

      This is a great point as none of our stalling reporters included an ORF with alanine. We now include a short paragraph in the Discussion section to raise this possibility.

      - Tetracycline was added at the "subinhibitory concentration" of 8 ug/mL for the reporter assays but at 1 ug/mL for the ribosome profiling experiments. Authors should explain what was the rational for this.

      We think the reviewer is mixing up the epidemiological cut-off value of 8 ug/mL with the concentration used in experiments (0.5-1 ug/mL for reporter assays and ribosome profiling). The text was confusing, so we have added a sentence to the Methods section to indicate that epidemiological cut-off values and MICs were only a guide for selecting antibiotic concentrations to test.

      Reviewer #2 (Recommendations for the authors):

      I wish the authors had been slightly less dismissive of the reviewers' comments. At a minimum, it would be nice if the authors could be consistent about the ribosome representation throughout the manuscript;

      We apologize if our previous responses gave the impression of being dismissive. That was certainly not our intention. We greatly value the reviewers' feedback, and we appreciate the opportunity to clarify any misunderstandings. We believe the reviewer is referring to the different shape and color of the ribosome in Figures 8 and 9, and Figure 8 figure supplement 2, which we have now corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

      We appreciate your positive comments and constructive suggestions. We fully acknowledge your concerns about clearly communicating the limitations associated with the data used and analytical assumptions. We will try to get more satellite tracking data of birds migrating across the plateau. We will carefully consider the insights that our paper can deliver and make sure the limitations of our datasets and the critical assumption of niche conservatism are clearly presented. By explicitly clarifying these caveats, we believe the transparency and interpretability of the findings will be much improved.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for constructive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which will help us improve our manuscript.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.

      We understand your question about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. We agree that such an approach must be used properly. In the revision, we will explicitly clarify why this counterfactual comparison is useful – namely, it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths. We acknowledge that the counterfactual results are theoretical and will explicitly emphasise the assumptions involved (e.g. species–environment relationships hold between pre- and post- lift environments) in the main text. Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). We will also tone down the language around this analysis to avoid overstating its real-world relevance. In summary, we will clarify that the counterfactual analysis is meant to complement, not replace, empirical observations, and we will discuss its limitations so that its role is appropriately bounded in the paper.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      Thank you for your comments. We apologise for any confusion regarding the scope of our dataset. Our main conclusions are not solely derived from seven bird species. Rather, we integrated a full list of 50 bird species that migrate across the QTP and analysed their migratory patterns with eBird data. We studied the factors influencing their choices of migratory routes with seven species that were among the few with available tracking data across the QTP. In this revision, we will clarify the role of these seven species and the rationale for their selection. Additionally, we attempt to include more satellite tracking data to improve spatial coverage, as recommended by the reviewer and editor. Based on discussions with potential collaborators, we will hopefully include a number of at least 10 more species with available tracking data.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we will clearly state the assumptions of niche conservatism in the Introduction.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. However, in this study we intend to infer broad-scale movement patterns (e.g. general directions and stopover regions) rather than precise one-to-one population linkages. In the revision, we will carefully rephrase those sections to make clear that our inferences are at the species level and at large spatial scales. We will also explicitly state in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis can only suggest plausible routes and region-to-region linkages. We will contrast migratory routes identified by using eBird data and satellite tracking for the same species to check their similarity. We argue that, even with its limits, the eBird dataset can still yield useful insights (such as identifying major flyway corridors over the QTP).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      Thank you for recognising our efforts in the study. By integrating both satellite tracking and community-contributed data, we explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift shapes migratory patterns of birds. We will also acknowledge the study’s limitations to ensure that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We appreciate your suggestions to incorporate field tracking or radar studies to strengthen our results. All coauthors have years of field experiences, even on the QTP and Arctic. For example, the tracking data of peregrine falcons (Falco peregrinus) that we will incorporate in the revision are collected with during our own fieldwork in the Arctic for more than six years. We agree that more direct tracking (through GPS tagging or radar) would be an ideal way to validate migration pathways and population connectivity. In this revision, as stated above we will try to more species with satellite tracking data. We will also note that future studies should build on our findings by using dedicated tracking of more individual birds and radar monitoring of migration over the QTP. We will cite recent advances in these techniques and suggest that incorporating more tracking data could further test the hypotheses generated by our analyses.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We will rewrite this sentence to remove any ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We will remove the sentence to avoid misinterpretation.

      L 158 what is a migration circle? I do not know such a term.

      We will amend it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We will present this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on energy reserves acquired before breeding — rather than an ‘income’ strategy that depends on food acquired during breeding. However, we note that this interpretation would require further study.” By adding this caution, we will make it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We will also double-check that the rest of the discussion around this point is framed appropriately.


      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper is much improved as a result. In addition to minor points, we have made three main changes to our manuscript in response to the reviews. First, we addressed the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we discussed the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we clarified our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We are appreciative of the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we have elaborated on details of our findings, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibet Plateau (Lines 102-116), and (ii) major factors that shape current migration patterns of birds in the plateau (Lines 118-138). We have also better referenced the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We have revised the manuscript carefully in response to the reviewer’s comments and believe that it is much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We have addressed this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We have clarified this in the Introduction (Lines 81-93).

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering bird distributions are more likely to be influenced by food resources and vegetation distributions (Qu et al. 2010, Li et al. 2021, Martins et al. 2024), and the available food and vegetation before the uplift can provide suitable habitats for birds (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau rise on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We have clarified the assumptions and evidence we have for the modelling in Methods (Lines 362-370).

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. In contrast, tracking data can provide detailed information of the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We have better demonstrated how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We have also added a separate caveat section to discuss the limitations stated above (Lines 202-215).

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      In this revision, we have clarified the selection criteria for the 50 species and outlined the boundaries of the breeding areas of all birds (Lines 243-249). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list. Migratory birds may follow a capital or income migratory strategy depending on how much birds ingest endogenous reserved energy gained prior to reproduction. We have added discussions on how these migratory strategies might influence the effects of environment on migratory direction (Lines 183-200).

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. This would minimize the possible errors by just using breeding areas such as the biases caused by relative locations of breeding areas to the QTP as the reviewer pointed. We have better explained this both in the Introduction, Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we have reported Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC)assess the performances of the models (Table S1). AUC is a threshold-independent measurement for discrimination ability between presence and random points (Phillips et al. 2006). When the AUC value is higher than 0.75, the model was considered to be good (Elith et al. 2006). (Lines 379-383).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Please see our responses above.

      Reviewer #2 (Recommendations for the authors):

      Methodological issues:

      Line 219 Why have you selected only 64 species and what were the selection criteria?

      We have clarified the selection criteria (Lines 243-248). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      Minor:

      Line 219 eBird has very uneven distribution, especially in vast areas of Russia. How can your exercise on Lines 232-238 overcome this issue?

      Yes, eBird data can be biased due to patchy sampling and variation of observers’ skills in identifying species. To address this issue, we have developed an adaptive spatial-temporal modelling (stemflow; Chen et al. 2024) to correct the imbalance distribution of data and modelled the observer experience to address the bias in recognising species. The stemflow was developed based on a machine learning modelling framework (AdaSTEM) which leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species at different scales. It has been frequently used in modelling eBird data (Fink et al. 2013, Johnston et al. 2015, Fink et al. 2020) and has been proven to be efficient and advanced in multi-scale spatiotemporal data modelling. We have better explained this (Lines 251-270; Lines 307-321).

      Line 54 This sentence sounds very empty and in fact does not tell us much.

      We have adjusted this sentenced to “Animal movement underpins species’ spatial distributions and ecosystem processes”.

      Line 55 Again a sentence that implies a causality of the annual cycle to make the species migrate. It does not make sense.

      We have revised this sentence as “An important animal movement behaviour is migrating between breeding and wintering grounds”.

      Line 58 How is our fascination with migratory journeys related to the present article? I think this line is empty.

      We have changed this sentence to “Those migratory journeys have intrigued a body of different approaches and indicators to describe and model migration, including migratory direction, speed, timing, distance, and staging periods”.

      Figure 1 - ABC insets are OK, but a combination of lati- and longitudinal patterns is possible, e.g. in species with conservative strategies or for whatever other reason.

      Thank you for the suggestion. We kept the ABC insets rather than combining them together as we believe this can deliver a clear structure of influence of QTP uplift under different scenarios.

      The legend to Figure 2 is not self-explanatory. Please make it clear what the response variable is and its units. The first line of the legend should read something like The influence of environmental factors on the direction of avian migration.

      Thank you. We have amended the legends of Figure 2 as suggested:

      “Figure 2. The influence of environmental factors on the direction of avian migration.  Migratory directions are calculated based on the azimuths between each adjacent stopover, breeding and wintering areas for each species. We employ multivariate linear regression models under the Bayesian framework to measure the correlation between environmental factors and avian migratory directions. Wind represents the wind cost calculated by wind connectivity. Vegetation is measured by the proportion of average vegetation cover in each pixel (~1.9° in latitude by 2.5° in longitude). Temperature is the average annual temperature. Precipitation is the average yearly precipitation. All environmental layers are obtained using the Community Earth System Model. West QTP, central QTP, and East QTP denote areas in the areas west (longitude < 73°E), central (73°E ≤ longitude < 105°E), and east of (longitude ≥ 105°E) the Qinghai-Tibet Plateau, respectively.”

      References

      Chen, Y., Z. Gu, and X. Zhan. 2024. stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software 9:6158.

      Elith, J., C. H. Graham, R. P. Anderson, M. Dudík, S. Ferrier, A. Guisan, R. J. Hijmans, F. Huettmann, J. R. Leathwick, A. Lehmann, J. Li, L. G. Lohmann, B. A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. Scachetti-Pereira, R. E. Schapire, J. Soberón, S. Williams, M. S. Wisz, and N. E. Zimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

      Fink, D., T. Auer, A. Johnston, V. Ruiz-Gutierrez, W. M. Hochachka, and S. Kelling. 2020. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications 30:e02056.

      Fink, D., T. Damoulas, and J. Dave. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. Pages 1284-1290 in Proceedings of the AAAI Conference on Artificial Intelligence.

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Johnston, A., D. Fink, M. D. Reynolds, W. M. Hochachka, B. L. Sullivan, N. E. Bruns, E. Hallstein, M. S. Merrifield, S. Matsumoto, and S. Kelling. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications 25:1749-1756.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Li, S.-F., P. J. Valdes, A. Farnsworth, T. Davies-Barnard, T. Su, D. J. Lunt, R. A. Spicer, J. Liu, W.-Y.-D. Deng, J. Huang, H. Tang, A. Ridgwell, L.-L. Chen, and Z.-K. Zhou. 2021. Orographic evolution of northern Tibet shaped vegetation and plant diversity in eastern Asia. Science Advances 7:eabc7741.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild Bird Migration across the Qinghai-Tibetan Plateau: A Transmission Route for Highly Pathogenic H5N1. Plos One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Qu, Y., F. Lei, R. Zhang, and X. Lu. 2010. Comparative phylogeography of five avian species: implications for Pleistocene evolutionary history in the Qinghai-Tibetan plateau. Molecular Ecology 19:338-351.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of Brown Headed Gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, T. Ma, L.-X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s Gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, Z. Xing, and F.-S. Li. 2011. Migration Routes and Stop-Over Sites Determined with Satellite Tracking of Bar-Headed Geese (Anser indicus) Breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. Author response:

      We thank the reviewers for their thoughtful and generous assessment of our work. Overall, the reviewers found our work to be novel and relevant. In particular: reviewer #1 found that our manuscript “It is timely and highly valuable for the telomere field” reviewer #2 stated, “Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.” Reviewer #3 stated that “The study is original, the experiments were well-controlled and excellently executed.”

      We are extremely grateful for these comments and want to thank all the reviewers and the editors for their time and effort in reviewing our work.

      The reviewers had a number of suggestions to improve our work. We have addressed all the points as highlighted in the point-by-point responses below.

      Reviewer 1:

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends.

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. The rubric is intended to be used as a stand-alone resource. The following is an explanation of each category and how we framed it to meet our development goals.

      This rubric is supposed to highlight the basic needs of all students and how well a certain e-learning tool fits into these needs. I think this rubric is a good start for analyzing digital tools that you may bring into the classroom, but the true test is seeing how much your students have learned from these tools after they are used. Just because a tool may fit perfectly in this rubric does not mean it will educate students perfectly in the classroom.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in the secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in the inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:

      (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real-time, is novel and exciting.

      (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.

      (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.

      (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      We thank the Reviewer for their positive comments on our manuscript.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation of predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.

      We understand the reviewer’s comment. It is true that this work, being the first in the field using real-time adapting synchronous speech and intracerebral neural data, is a descriptive work, that hopefully will pave the way for further studies. We have now added more statistical analyses (see point 2) to go beyond a descriptive approach and we have also rewritten the discussion to clarify how this work can possibly contribute to disentangle different models of language interaction. Most importantly we have also run new analyses taking into account the specific phase relationship, as suggested.

      We already had an analysis using instantaneous phase difference in the phase-amplitude coupling approach, that bridges phase of behaviour to neural responses (amplitude in the high-frequency range). However, this analysis, as the reviewer noted, does not distinguish between positive and negative lags, but rather uses the continuous fluctuations of coordinative behaviour. Following the reviewer’s suggestion, we have now run a new analysis estimating the average delay (between virtual partner speech and patient speech) in each trial, using a cross-correlation approach. This gives a distribution of delays across trials that can then be “binned” as positive or negative. We have thus rerun the phase-amplitude coupling analyses on positive and negative trials separately, to assess whether the phase amplitude relationship depends upon the anticipatory (negative lags) or compensatory (positive lags) behaviour. Our new analysis (now in the supplementary, see figure below) does not reveal significant differences between positive and negative lags. This lack of difference, although not easy to interpret, is nonetheless interesting because it seems to show that the IFG does not have a stronger coupling for anticipatory trials. Rather the IFG seems to be strongly involved in adjusting behaviour, minimizing the error, independently of whether this is early or late.

      We have updated the “Coupling behavioural and neurophysiological data” section in Materials and methods as follows:  

      “In the third approach, we assessed whether the phase-amplitude relationship (or coupling) depends upon the anticipatory (negative delays) or compensatory (positive delays) behaviour between the VO and the patients’ speech. We computed the average delay in each trial using a cross-correlation approach on speech signals (between patient and VP) with the MATLAB function xcorr. A median split (patient-specific ; average median split = 0ms, average sd = 24ms) was applied to conserve a sufficient amount of data, classifying trials below the median as “anticipatory behaviour” and trials above the median as “compensatory behaviour”. Then we conducted the phase-amplitude coupling analyses on positive and negative trials separately.”

      We also added a paragraph on this finding in the Discussion:

      “Our results highlight the involvement of the inferior frontal gyrus (IFG) bilaterally, in particular the BA44 region, in speech coordination. First, trials with a weak verbal coordination (VCI) are accompanied by more prominent high frequency activity (HFa, Fig.4; Fig.S4). Second, when considering the within-trial time-resolved dynamics, the phase-amplitude coupling (PAC) reveals a tight relation between the low frequency behavioural dynamics (phase) and the modulation of high-frequency neural activity (amplitude, Fig.5B ; Fig.S5). This relation is strongest when considering the phase adjustments rather than the phase of speech of the VP per se : larger deviations in verbal coordination are accompanied by increase in HFa. Additionally, we also tested for potential effects of different asynchronies (i.e., temporal delay) between the participant's speech and that of the virtual partner but found no significant differences (Fig.S6). While lack of delay-effect does not permit to conclude about the sensitivity of BA44 to absolute timing of the partner’s speech, its neural dynamics are linked to the ongoing process of resolving phase deviations and maintaining synchrony.”

      (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.

      The reviewer is right. We have now added statistical analyses showing that:

      (1) the ratio between synchronization and desynchronization evolves across frequencies (as often reported in the literature).

      (2) the sign of rho values also evolves across frequencies.

      (3) the clustering does indeed differ when taking into account behaviour. We have also clarified the use of clustering and the reasoning behind it.

      We have updated the Materials and methods section as follows:

      “The statistical difference between spatial clustering in global effect and brain-behaviour correlation was estimated with linear model using the R function lm (stat package), post-hoc comparisons were corrected for multiple comparisons using the Tukey test (lsmeans R package ; Lenth, 2016). The statistical difference between clustering in global effect and behaviour correlation across the number of clusters was estimated using permutation tests (N=1000) by computing the silhouette score difference between the two conditions.” We have updated the Results section as follows:

      (1) “This modulation between synchronization and desynchronization across frequencies was significant (F(5) = 6.42, p < .001 ; estimated with linear model using the R function lm).”

      (2) “The first observation is a gradual transition in the direction of correlations as we move up frequency bands, from positive correlations at low frequencies to negative ones at high frequencies (F(5) = 2.68, p = .02). This effect, present in both hemispheres, mimics the reversed desynchronization/synchronization process in low and high frequency bands reported above.”

      (3) “Importantly, compared to the global activity (task vs rest, Fig 3A), the neural spatial profile of the behaviour-related activity (Fig 3B) is more clustered, in the left hemisphere. Indeed, silhouette scores are systematically higher for behaviour-related activity compared to global activity, indicating greater clustering consistency across frequency bands (t(106) = 7.79, p < .001, see Figure S3). Moreover, silhouette scores are maximal, in particular for HFa, for five clusters (p < .001), located in the IFG BA44, the IPL BA 40 and the STG BA 41/42 and BA22 (see Figure S3).”

      (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?

      Thank you for this thoughtful feedback. We agree that the relationship between speech coordination and specific speech units, such as consonants versus vowels, is an intriguing question. However, in our study, both interlocutors (the participant and the virtual partner) are adapting their speech production in real-time. This interactive coordination makes it difficult to isolate neural signatures corresponding to precise segments like consonants or vowels, as the adjustments occur in a continuous and dynamic context.

      The VP's ability to adapt depends on its sensitivity to spectral cues, such as the transition from one phonetic element to another. This is likely influenced by the type of articulation, with certain transitions being more salient (e.g., between a stop consonant like "p" and a vowel like "a") and others being less distinct (e.g., between nasal consonants like "m" and a vowel). Thus, the VP’s spectral adaptation tends to occur at these transitions, which are more prominent in some cases than in others.

      For the participants, previous studies have shown a greater sensitivity during the production of stressed vowels (Oschkinat & Hoole, 2022; Li & Lancia, 2024), which may reflect a heightened attentional or motor adjustment to stressed syllables.

      Here, we did not specifically address the question of coordination at the level of individual linguistic units. Moreover, even if we attempted to focus on this level, it would be challenging to relate neural dynamics directly to specific speech segments. The question of how synchronization at the level of individual linguistic units might relate to neural data is complex. The lack of clear, unit-specific predictions makes it difficult to parse out distinct neural signatures tied to individual segments, particularly when both interlocutors are continuously adjusting their speech in relation to one another.

      Therefore, while we recognize the potential importance of examining synchronization at the level of individual phonetic elements, the design of our task and the nature of the coordination in this interactive context (realtime bidirection adaptation) led us to focus more broadly on the overall dynamics of speech synchronization at the syllabic level, rather than on specific linguistic units.

      We now state at the end of the Discussion section:

      “It is worth noting that the influence of specific speech units, such as consonants versus vowels, on speech coordination remains to be explored. In non-interactive contexts, participants show greater sensitivity during the production of stressed vowels, possibly reflecting heightened attentional or motor adjustments (Oschkinat & Hoole, 2022; Li & Lancia, 2024). In this study, the VP’s adaptation relies on sensitivity to spectral cues, particularly phonetic transitions, with some (e.g., formant transitions) being more salient than others. However, how these effects manifest in an interactive setting remains an open question, as both interlocutors continuously adjust their speech in real time. Future studies could investigate whether coordination signals, such as phase resets, preferentially align with specific parts of the syllable.” References cited:

      – Oschkinat, M., & Hoole, P. (2022). Reactive feedback control and adaptation to perturbed speech timing in stressed and unstressed syllables. Journal of Phonetics, 91, 101133.

      – Li, J., & Lancia, L. (2024). A multimodal approach to study the nature of coordinative patterns underlying speech rhythm. In Proc. Interspeech, 397-401.

      (4) In the discussion the results are related to a previously-described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.

      We thank the reviewer for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised Discussion section, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context". Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised Discussion also incorporates findings by Ozker et al. (2022, 2024), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases. This result is reminiscent of findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externallygenerated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection. In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020). Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.” References cited:

      – Franken, M. K., Hartsuiker, R. J., Johansson, P., Hall, L., & Lind, A. (2021). Speaking With an Alien Voice: Flexible Sense of Agency During Vocal Production. Journal of Experimental Psychology-Human perception and performance, 47(4), 479-494. https://doi.org/10.1037/xhp0000799

      – Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in human neuroscience, 5, 82.

      – Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say. Psychological Science, 25(6), 1198-1205. https://doi.org/10.1177/0956797614529797

      – Meekings, S., & Scott, S. K. (2021). Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies. Journal of Cognitive Neuroscience, 33(3), 422-444. https://doi.org/10.1162/jocn_a_01661

      – Niziolek C. A., Nagarajan S. S., Houde J. F (2013) What does motor efference copy represent? Evidence from speech production Journal of Neuroscience 33:16110–16116Ozker M., Doyle W., Devinsky O., Flinker A (2022) A cortical network processes auditory error signals during human speech production to maintain fluency PLoS Biology 20.

      – Ozker, M., Yu, L., Dugan, P., Doyle, W., Friedman, D., Devinsky, O., & Flinker, A. (2024). Speech-induced suppression and vocal feedback sensitivity in human cortex. eLife, 13, RP94198. https://doi.org/10.7554/eLife.94198

      – Zheng, Z. Z., MacDonald, E. N., Munhall, K. G., & Johnsrude, I. S. (2011). Perceiving a Stranger's Voice as Being One's Own: A 'Rubber Voice' Illusion? PLOS ONE, 6(4), e18655.

      (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      The reviewer is correct, we apologize for this missing information. We now specify that the coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level.  

      Concerning the definition of coupling as weak, one should consider that, in the Kuramoto model, the strength of coupling (k) is relative to the spread of the natural frequencies (Δω) in the system. In our study, the natural frequencies of syllables range approximately from 2 Hz to 10Hz, resulting in a frequency spread of Δω = 8 Hz. For coupling to strongly synchronize oscillators across such a wide range, k must be comparable to or exceed Δω. Thus, since k = 0.1 is far much smaller than Δω, it is therefore classified as weak coupling.

      We have now modified the Materials and methods section as follows:

      “More precisely, for a third of the trials the VP had a neutral behaviour (close to zero coupling: k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = -0.09). And for the last third of the trials the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”

      Regarding the criterion of including regions recorded in at least 7 patients, our goal was to balance data completeness with statistical power. Given our total sample of 16 patients, this threshold ensures that each included region is represented in at least ~44% of the cohort, reducing the likelihood of spurious findings due to extremely small sample sizes. This choice also aligns with common neurophysiological analysis practices, where a minimum number of subjects (at least 2 in extreme cases) is required to achieve meaningful interindividual comparisons while avoiding excessive data exclusion. Additionally, this threshold maintains a reasonable tradeoff between maximizing patient inclusion and ensuring that statistical tests remain robust.

      We have now added more information in the Results section “Spectral profiles in the language network are nuanced by behaviour” on this point as follows:

      “To balance data completeness and statistical power, we included only brain regions recorded in at least 7 patients (~44% of the cohort) for the left hemisphere and at least 5 patients for the right hemisphere (~31% of the cohort), ensuring sufficient representation while minimizing biases due to sparse data.”

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (secondary auditory cortex and IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the left hemisphere brain areas involved in interactive speaking behaviours, particularly highlighting the highfrequency activity of the IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor, and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using prerecorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      We sincerely appreciate the Reviewer's thoughtful and positive feedback on our manuscript.

      Weaknesses:

      One major limitation of the current study is the lack of coverage of the right hemisphere by the implanted electrodes. Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. However, this means that the current study neglects the potentially important role of the right hemisphere in this task. The right hemisphere has previously been proposed to support feedback control for speech (likely a core process engaged by synchronous speech), as opposed to the left hemisphere which has been argued to underlie feedforward control (Tourville & Guenther, 2011). Indeed, a previous fMRI study of synchronous speech reported the engagement of a network of right hemisphere regions, including STG, IPL, IFG, and the temporal pole (Jasmin et al., 2016). Further, the release from speech-induced suppression during a synchronous speech reported by Jasmin et al. was found in the right temporal pole, which may explain the discrepancy with the current finding of reduced leftward high-frequency activity with increasing verbal coordination (suggesting instead increased speech-induced suppression for successful synchronisation). The findings should therefore be interpreted with the caveat that they are limited to the left hemisphere, and are thus likely missing an important aspect of the neural processing underpinning verbal coordination behaviour.

      We have now included, in the supplementary materials, data from the right hemisphere, although the coverage is a bit sparse (Figures S2, S4, S5, see our responses in the ‘Recommendation for the authors’ section, below). We have also revised the Discussion section to add the putative role of right temporal regions (see below as well).

      A further limitation of this study is that its findings are purely correlational in nature; that is, the results tell us how neural activity correlates with behaviour, but not whether it is instrumental in that behaviour. Elucidating the latter would require some form of intervention such as electrode stimulation, to disrupt activity in a brain area and measure the resulting effect on behaviour. Any claims therefore as to the specific role of brain areas in verbal coordination (e.g. the role of the IFG in supporting online coordinative adjustments to achieve synchronisation) are therefore speculative.

      We appreciate the reviewer’s observation regarding the correlational nature of our findings and agree that this is a common limitation of neuroimaging studies. While elucidating causal relationships would indeed require intervention techniques such as electrical stimulation, our study leverages the unique advantages of intracerebral recordings, offering the best available spatial and temporal resolution alongside a high signal-tonoise ratio. These attributes ensure that our data accurately reflect neural activity and its temporal dynamics, providing a robust foundation for understanding the relationship between neural processes and behaviour. Therefore, while causal claims are beyond the scope of this study, the precision of our methodology allows us to make well-supported observations about the neural correlates of synchronous speech tasks.

      Recommendations for the authors:

      Reviewing Editor Comment:

      After joint consultation, we are seeing the potential for the report to be strengthened and the evidence here to be deemed ultimately at least 'solid': to us (editors and reviewers) it seems that this would require both (1) clarifying/acknowledging the limitations of not having right hemisphere data, and (2) running some of the additional analyses the reviewers suggest, which should allow for richer examination of the data e.g. phase relationships in areas that correlate with synchronisation.

      We have now added data on the right hemisphere (RH) that we did not previously report due to a rather sparse sampling of the RH. These results are now reported in the Results section as well as in the Supplementary section, where we put all right hemisphere figures for all analyses (Figure S2, S4, S5). We have also run additional analyses digging into the phase relationship in areas that correlate with synchronisation (Figure S6). These additional analyses allowed us to improve the Discussion section as well.

      Reviewer #1 (Recommendations For The Authors):

      In some sections, the writing is a bit unclear, with both typos and vague statements that could be fixed with careful proofreading.

      We thank the reviewer for pointing out areas where the writing could be improved. We carefully proofread the manuscript to address typos and clarify any vague statements. Specific sections identified as unclear have been rephrased for better precision and readability.

      In Figure 1, the colors repeat, making it impossible to tell patients apart.

      We have now updated Figure 1 colormap to avoid redundancy and added the right hemisphere.

      Line 132: "16 unilateral implantations (9 left, 7 bilateral implantations)". Should this say 7 right hemisphere? If so, the following sentence stating that there was "insufficient cover [sic] of the right hemisphere" is unclear, since the number of patients between LH and RH is similar.

      The confusion was due to the fact that the lateralization refers to the presence/absence of electrodes in the Heschl’s gyrus (left : H’ ; right : H) exclusively.

      We have thus changed this section as follows:

      “16 patients (7 women, mean age 29.8 y, range 17 - 50 y) with pharmacoresistant epilepsy took part in the study. They were included if their implantation map covered at least partially the Heschl's gyrus and had sufficiently intact diction to support relatively sustained language production.” The relevant part (previously line 132) now states:

      “Sixteen patients with a total of 236 electrodes (145 in the left hemisphere) and 2395 contacts (1459 in the left hemisphere, see Figure 1). While this gives a rather sparse coverage of the right hemisphere, we decided, due to the rarity of this type of data, to report results for both hemispheres, with figures for the left hemisphere in the main text and figures for the right hemisphere in the supplementary section.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To address the concern regarding the absence of data from the right hemisphere, I would advise the authors to directly acknowledge this limitation in their Discussion section, citing relevant work suggesting that the right hemisphere has an important role to play in this task (e.g. Jasmin et al., 2016). You should also make this clear in your abstract e.g. you could rewrite the sentence in line 40 to be: "Then, we recorded the intracranial brain activity of the left hemisphere in 16 patients with drug-resistant epilepsy...".

      We are grateful to the reviewer for this comment that incited us to look into the right hemisphere data. We have now included results in the right hemisphere, although the coverage is a bit sparse. We have also revised the Discussion section to add the putative role of right temporal regions. Interestingly, our results show, as suggested by the reviewer, a clear involvement of the RH in this task.

      First, the full brain analyses show a very similar implication of the RH as compared to the LH (see Figure below). We have now added in the Results section:

      “As expected, the whole language network is strongly involved, including both dorsal and ventral pathways (Fig 3A). More precisely, in the left temporal lobe the superior, middle and inferior temporal gyri, in the left parietal lobe the inferior parietal lobule (IPL) and in the left frontal lobe the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG). Similar results are observed in the right hemisphere, neural responses being present across all six frequency bands with medium to large modulation in activity compared to baseline (Figure S2A) in the same regions. Desynchronizations are present in the theta, alpha and beta bands while the low gamma and HFa bands show power increases.”

      As to compared to the left hemisphere, assessing brain-behaviour correlations in the right hemisphere does not provide the same statistical power, because some anatomical regions have very few electrodes. Nonetheless, we observe a strong correlation in the right IFG, similar to the one we previously reported in the left hemisphere, and we now report in the Results section:

      “The decrease in HFa along the dorsal pathway is replicated in the right hemisphere (Figure S4). However, while both the right STG BA41/42 and STG BA22 present a power increase (compared to baseline) — with a stronger increase for the STG BA41/42 — neither shows a significant correlation with verbal coordination (t(45)=-1.65, p=.1 ; t(8)=-0.67, p=.5 ; Student’s T test, FDR correction). By contrast, results in the right IFG BA44 are similar to the one observed in the left hemisphere with a significant power increase associated with a negative brainbehaviour correlation (t(17) = -3.11, p = .01 ; Student’s T test, FDR correction).”

      Interestingly, the phase-amplitude coupling analysis yields very similar results in both hemispheres (exception made for BA22). We have thus updated the Results section as follows:

      “Notably, when comparing – within the regions of interest previously described – the PAC with the virtual partner speech and the PAC with the phase difference, the coupling relationship changes when moving along the dorsal pathway: a stronger coupling in the auditory regions with the speech input, no difference between speech and coordination dynamics in the IPL and a stronger coupling for the coordinative dynamics compared to speech signal in the IFG (Figure 5B ). When looking at the right hemisphere, we observe the same changes in the coupling relationship when moving along the dorsal pathway, except that no difference between speech and coordination dynamics is present in the right secondary auditory regions (STG BA22; Figure S5).”

      We also included in the Discussion section the right hemisphere results also mentioning previous work of Guenther and the one of Jasmin. On the section “Left secondary auditory regions are more sensitive to coordinative behaviour” one can read:

      “Furthermore, the absence of correlation in the right STG BA22 (Figure S4) seems in first stance to challenge influential speech production models (e.g. Guenther & Hickok, 2016) that propose that the right hemisphere is involved in feedback control. However, one needs to consider the the task at stake heavily relied upon temporal mismatches and adjustments. In this context, the left-lateralized sensitivity to verbal coordination reminds of the works of Floegel and colleagues (2020, 2023) suggesting that both hemispheres are involved depending on the type of error: the right auditory association cortex monitoring preferentially spectral speech features and the left auditory association cortex monitoring preferentially temporal speech features. Nonetheless, the right temporal pole seems to be sensitive to speech coordinative behaviour, confirming previous findings using fMRI (Jasmin et al., 2016) and thus showing that the right hemisphere has an important role to play in this type of tasks (e.g. Jasmin et al., 2016).”

      References cited:

      – Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      – Floegel, M., Kasper, J., Perrier, P., & Kell, C. A. (2023). How the conception of control influences our understanding of actions. Nature Reviews Neuroscience, 24(5), 313-329.

      – Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In Neurobiology of language (pp. 725-740). Academic Press.

      (2) When discussing previous work on alignment during synchronous speech, you may wish to include a recently published paper by Bradshaw et al (2024); this manipulated the acoustics of the accompanist's voice during a synchronous speech task to show interactions between speech motor adaptation and phonetic convergence/alignment.

      We thank the reviewer for pointing to this recent and interesting paper. We added the article as reference as follows

      “Furthermore, synchronous speech favors the emergence of alignment phenomena, for instance of the fundamental frequency or the syllable onset (Assaneo et al., 2019 ; Bradshaw & McGettigan, 2021 ; Bradshaw et al., 2023; Bradshaw et al., 2024).”

      (3) Line 80: "Synchronous speech resembles to a certain extent to delayed auditory feedback tasks"- I think you mean "altered auditory feedback tasks" here.

      In the case of synchronous speech it is more about timing than altered speech signals, that is why the comparison is done with delayed and not altered auditory feedback. Nonetheless, we understand the Reviewer’s point and we have now changed the sentence as follows:

      “Synchronous speech resembles to a certain extent to delayed/altered auditory feedback tasks”

      (4) When discussing superior temporal responses during such altered feedback tasks, you may also want to cite a review paper by Meekings and Scott (2021).

      We thank the reviewer for this suggestion, indeed this was a big oversight!

      The paper is now quoted in the introduction as follows:

      “Previous studies have revealed increased responses in the superior temporal regions compared to normal feedback conditions (Hirano et al., 1997 ; Hashimoto & Sakai, 2003 ; Takaso et al., 2010 ; Ozerk et al., 2022 ; Floegel et al., 2020 ; see Meekings & Scott, 2021 for a review of error-monitoring and feedback control in the STG during speech production).”

      Furthermore, we updated the discussion part concerning the speaker-induced suppression phenomenon (see below our response to the point 10).

      (5) Line 125: "The parameters and sound adjustment were set using an external low-latency sound card (RME Babyface Pro Fs)". Can you please report the total feedback loop latency in your set-up? Or at the least cite the following paper which reports low latencies with this audio device.

      Kim, K. S., Wang, H., & Max, L. (2020). It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback. Journal of Speech, Language, and Hearing Research, 63(8), 25222534. https://doi.org/10.1044/2020_JSLHR-19-00419

      We now report the total feedback loop latency (~5ms) and also cite the relevant paper (Kim et al., 2020).

      (6) Line 127 "A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli." What do you mean here by an 'optimal balance'? Was the participant's own voice always louder than the VP stimuli? Can you report roughly what you consider to be a comfortable volume in dB?

      This point was indeed unlcear. We have now changed as follows:

      “A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli. The aim of this procedure was that the patient would subjectively perceive their voice and the VP-voice in equal measure. VP voice was delivered at approximately 70dB.”

      (7) Relatedly, did you use any noise masking to mask the air-conducted feedback from their own voice (which would have been slightly out of phase with the feedback through the headphones, depending on your latency)?

      Considering the low-latency condition allowed with the sound card (RME Babyface Pro Fs), we did not use noise masking to mask the air-conducted feedback from the self-voice of the patients.

      (8) Line 141: "four short sentences were pre-recorded by a woman and a man." Did all participants synchronise with both the man and woman or was the VP gender matched to that of the participant/patient?

      We thank the reviewer for this important missing detail. We know changed the text as follows:

      “Four stimuli corresponding to four short sentences were pre-recorded by both a female and a male speaker. This allowed to adapt to the natural gender differences in fundamental frequency (i.e. so that the VP gender matched that of the patients). All stimuli were normalised in amplitude.”

      (9) Can you clarify what instructions participants were given regarding the VP? That is, were they told that this was a recording or a real live speaker? Were they naïve to the manipulation of the VP's coupling to the participant?

      We have now added this information to the task description as follows:

      “Participants, comfortably seated in a medical chair, were instructed that they would perform a real-time interactive synchronous speech task with an artificial agent (Virtual Partner, henceforth VP, see next section) that can modulate and adapt to the participant’s speech in real time.”

      “The third step was the actual experiment. This was identical to the training but consisted of 24 trials (14s long, speech rate ~3Hz, yielding ~1000 syllables). Importantly, the VP varied its coupling behaviour to the participant. More precisely, for a third of the sequences the VP had a neutral behaviour (close to zero coupling : k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = - 0.09). And for the last third of the sequences the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”  

      (10) The paragraph from line 438 entitled "Secondary auditory regions are more sensitive to coordinative behaviour" includes an interesting discussion of the relation of the current findings to the phenomenon of speech-induced suppression (SIS). However, the authors appear to equate the observed decrease in highfrequency activity as speech coordination increases with the phenomenon of SIS (in lines 456-457), which is quite a speculative leap. I would encourage the authors to temper this discussion by referring to SIS as a potentially related phenomenon, with a need for more experimental work to determine if this is indeed the same phenomenon as the decreases in high-frequency power observed here. I believe that the authors are arguing here for an interpretation of SIS as reflecting internal modelling of sensory input regardless of whether this is self-generated or other-generated; if this is indeed the case, I would ask the authors to be more explicit here that these ideas are not a standard part of the traditional account of SIS, which only includes internal modelling of self-produced sensory feedback.

      As stated in the public review, we thank both reviewers for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised discussion, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context." Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised discussion also incorporates findings by Ozker et al. (2024, 2022), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of synchrony increases. This result aligns with findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externally generated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection.

      In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020).”

      (11) Within this section, you also speculate in line 460 that "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice." I would recommend citing studies on the 'rubber voice' effect to back up this claim (e.g. Franken et al., 2021; Lind et al., 2014; Zheng et al., 2011).

      We are grateful to the Reviewer for this interesting suggestion. Directly following the previous comment, the section now states:

      “Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.”

      (12) As noted in my public review, since your methods are correlational, you need to be careful about inferring the causal role of any brain areas in supporting a specific aspect of functioning e.g. line 501-504: "By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the input-output phase difference (input of the VP - output of the speaker), a metric that reflects the amount of error in the internal computation to reach optimal coordination, which indicates that this region optimises the predictive and coordinative behaviour required by the task." I would argue that the latter part of this sentence is a conclusion that, although consistent with, goes beyond the current data in this study, and thus needs tempering.

      We agree with the Reviewer and changed the sentence as follows:

      “By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the inputoutput phase difference (input of the VP - output of the speaker), a metric that could possibly reflect the amount of error in the internal computation to reach optimal coordination. This indicates that this region could have an implication in the optimisation of the predictive and coordinative behaviour required by the task.”

    1. In 2019 the company Facebook (now called Meta) presented an internal study that found that Instagram was bad for the mental health of teenage girls, and yet they still allowed teenage girls to use Instagram. So, what does social media do to the mental health of teenage girls, and to all its other users?

      I think it’s a great question of should we still allow people to use instagram if it’s bad for us. If Meta’s proves that instagram can be bad to teenage girls. Shouldn’t we find ways to let it be beneficial instead of just ban it. I would considered that as a method to stop instagram from taking away customers from Facebook. Also the study indicated that in general this could be harmful. But there’s people benefits from this platform. If unfair to close it just because this may be bad for the tonnage girls mental health. Finding ways to make it beneficial to mental health would be the right solution.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02888

      Corresponding author(s): Christian, Fankhauser

      General Statements

      We were pleased to see that the three reviewers found our work interesting and provided supportive and constructive comments.

      Our answers to their comments and/or how we propose to address them in a revised manuscript are included in bold.

      1. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments: *• I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text. *

      We appreciate the reviewers’ comments and will address it by introducing appropriate changes to the discussion. One element that should be pointed out is that the study of Willige et al., 2021 allows us to look at sites where PIF7 is recruited in response to the shade stimulus (a low R/FR treatment) and relate this to higher transcript abundance of the nearby genes. The study of Pfeiffer et al., 2014 which analyses PIF ChIP studies from several labs does not include this dynamic view of PIF recruitment in response to a stimulus. For example, this study re-analyses data from our lab, Hornitschek et al., 2012, in which we did PIF5 ChIP in low R/FR, but we did not compare that to high R/FR to enable an analysis of sites where we see recruitment of PIF5 in response to a shade cue. In the revised manuscript we will also include a new figure comparing PIF7 recruitment and changes in gene expression at direct PIF target genes.

      • I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.* This is a thoughtful suggestion. Our choice to focus on PIF7 target genes is dictated by two reasons. First, the finding that amongst all tested PIFs, PIF7 is the major contributor to the control of low R/FR (neighbor proximity) induced responses in seedlings (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). In addition, the PIF7 ChIP-seq and gene expression data from the Willige et al., 2021 paper was obtained using growth conditions very similar to the ones we used, hence allowing us to compare it to our data. As the reviewer suggests, other PIFs also contribute to the low R/FR response and hence looking at ChIP-seq for those PIFs in publicly available data is also informative. One limitation of this data is that ChIP-seq was not always done in seedlings grown in conditions directly comparable to the conditions we used (except for PIF5, see above). Nevertheless, we have performed this analysis with the available data suggested by the reviewer and intend to include the results in the revised version of the manuscript, presumably updated Figure 4B.

      • In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      We appreciate this comment but to be comprehensive, we like to include a Col-0 control for each experiment (whenever possible) and hence also include the data when available.

      • In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences. __This is a good comment. Having looked at PIL1 gene induction by low R/FR in dozens of similar experiments made us realize that indeed while the PIL1 induction is always massive, the extent is somewhat variable. Based on the data that we have (including from RNA-seq) we are convinced that this is due to the very low level of expression of PIL1 in high R/FR conditions. Given that induction by low R/FR is expressed as fold increase relative to baseline high R/FR expression, small changes in the lowly expressed PIL1* in high R/FR leads to seemingly significant differences in its induction by low R/FR across experiments.__

      All qPCR data is represented by three biological replicates, and the variation between them per experiment is low, which is reflected in the size of the SD error bars. Data on technical and biological replicates in each panel will be clearly indicated in the revised figure legends.

      • I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      * All qPCR and CoP-qPCR experiments have been performed with three 3 biological replicates as described in Materials and Methods section, and these are represented in the Figures. Relative gene expression in the qPCR experiments was normalized to two housekeeping genes YLS8 and UBC21 and afterwards to one biological replicate of Col-0 control in HRFR. As indicated for the previous comment information about replicates will be included in the updated figure legends.

      • Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      * This comment of the reviewer prompted us to propose a different way of representing the data that is clearer (new Figure 5B and 5C). We believe that this facilitates the comparison between the genotypes. Enrichment over the input was calculated for the chromatin accessibility of each region. Chromatin accessibility was further normalized against two open control regions on the promoters of ACT2 (AT3G18780, region chr3:6474579: 6474676) and RNA polymerase II transcription elongation factor (AT1G71080 region chr1:26811833:26811945). The difference between previous representation is that the regions are not additionally subtracted to Col-0 in HRFR. We will update the Materials and Methods and figure legend sections with this information.

      Minor comments: • Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      We propose changing the presentation of the hypocotyl length data to show the values for days side-by-side as the Reviewer suggests.

      • I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      * We agree with the reviewer and will reduce the paragraph about auxin and merge it with the previous paragraph about transcription.

        • For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      * We agree that presenting the raw data that was used for quantification is important. We will include the western blots used for quantifying PIF4, PIF5 and PIF7 protein abundance (and loading control DET3). This information will presumably be included to the Supplementary Figure 3C (figure number to be confirmed once we decide on all new data to be presented).

      • Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      * __As suggested by the reviewer, we will rephrase this paragraph to more accurately account for our data and also what was reported by others (e.g. Willige et al, 2021, in Li et al, 2012) regarding the regulation PIF7 levels and phosphorylation in response to a low R/FR treatment. __

      • There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.*

      We thank the reviewer for pointing this out. This will be clarified by appropriate changes in the figure to avoid confusion in the revised version of Figure 3B.

      Reviewer #1 (Significance (Required)):

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study by Paulisic et al. explores the variations in chromatin accessibility landscape induced by plant exposure to light with low red/far-red ratios (LRFR), which mimicks neighbor shade perception. The authors further compare these changes with the genome association of PIF4 and PIF7 transcription factors - two major actors of gene expression regulation in response to LRFR. While this is not highlighted in the main text, the analyses of chromatin accessibility are performed on INTACT-mediated nucleus sorting, presumably to ensure proper and clean isolation of nuclei.

      Major comments

      • Why is the experimental setup exposing plants to darkness overnight? Does this affect the response to LRFR, by a kind of reset of phytochrome signaling? I guess this choice was made to maintain a strong circadian rhythm. Yet, given that PIF genes are clock-regulated, I am afraid that this choice complicates data interpretation concerning the specific effects of LRFR exposure.

      There appears to be some confusion which prompts us to better explain our protocol both by changing Figure 1A (that outlines the experimental conditions) and in the text.

      Seedlings are grown in long day conditions because this is more physiologically relevant than growing them in constant light, which is a rather unnatural condition.

      The reviewer is correct that PIF transcription is under circadian control and the shade avoidance response is gated by the circadian clock (e.g. Salter et al., 2003). To prevent conflating circadian and light quality effects, all samples that are compared are harvested at the same ZT (circadian time – hours after dawn). This allows us to focus our analysis on light quality effects specifically. We are therefore convinced that our protocol does not complicate the interpretation of the LRFR effects reported here.

      • As a result of this setup, the 1h exposure to LRFR immediately follows HRFR while the 3h final LRFR exposure of the « 25h LRFR » samples immediately follows a long period of darkness. Can this explain why in several instances (e.g., at the ATHB2 gene) 1h LRFR seems to have stronger effects than 25h LRFR on chromatin accessibility?* Please check the explanation above. Both samples are harvested at the same ZT (ZT3, meaning 3 hours after dawn). The 1h LRFR seedlings went through the night, had 2 hours of HRFR then 1h of LRFR. The 25h are harvested at the very same ZT, meaning 3h after dawn. Importantly, the HRFR control was also harvested at ZT3, meaning 3h after dawn. As indicated above this protocol allows us to focus on the light quality effects by comparing samples that are all harvested at the same ZT.

      We expect that the changes in Fig. 1A and associated text changes will clarify this issue.

      • Lane 42 cites the work by Calderon et al 2022 as « Transcript levels of these genes increase before the H3K4me3 levels, implying that H3K4me3 increases as a consequence of active transcription ». Despite this previous study being reviewed and published, such a strong conclusion should be taken cautiously, and I disagree with it. The study by Calderon et al compares RNA-seq with ChIP-seq data, two methodologies with very different sensitivity, especially when employing bulk cells/whole seedlings as starting materials. For example, a gene strongly induced in a few cells will give a good Log2FC in RNA-seq data analysis (as new transcripts are produced after a low level of transcripts before shade) but, even though its chromatin variations would follow the same temporality or would even precede gene induction, this would be invisible in bulk ChIP-seq data analysis (which averages the signal of all cells together). I understand the rationale for relying on the conclusions made in an excellent lab with strong expertise in light signaling, but I recommend being cautious when relying on these conclusions to interpret new data.* We agree with this comment, and we will change the text to reflect this.

      • The problem is that the same issue holds true when comparing ATAC-seq and RNA-seq data. ATAC signals reflect average levels over all cells while RNA-seq data can be influenced by a few cell highly expressing a given gene. Even though authors carefully sorted nuclei using an INTACT approach, this should be discussed, in particular when gene clusters (such as cluster C-D) show no match between chromatin accessibility and transcript level variations. In this regard, is PIF7 expressed in many cells or a small niche of cells upon LRFR exposure? The conclusions on its role in chromatin accessibility, analyzed here as mean levels of many different seedling cells, could be affected by PIF7 activity pattern (e.g., at lane 293). __This is a good comment. PIF7 is expressed in the cotyledons and leaves in LD conditions (Kidokoro et al, 2009, Galvao et al, 2019), and few available scRNA-seq datasets indicate an enrichment of PIF7 in the epidermis (Kim et al, 2021, Lopez-Anido et al, 2021). LRFR exposure only mildly represses PIF7* expression as seen in Figure 3A and also in our bulk RNA-seq study (Table S4). We will discuss this potential limitation to our study in a revised version of the manuscript.__

      • Lane 89, the conclusion linking DNA methylation and DNA accessibility is unclear to me, this may be rephrased. Also, it should be noted that in gene-rich regions, most DNA methylation is located along the body of moderately to highly transcribing genes (gene-body methylation) while promoters of active and inactive genes are most frequently un-methylated.* We will rephrase to better reflect the presence or absence of DNA methylation on promoter regions of shade regulated genes that contain accessible sites.

      • Figure 3B shows a few ChIP-qPCR results with important conclusions. Why not sequencing the ChIPped DNA to obtain a genome-wide view of the PIF4-PIF7 relationships at chromatin, and also consequently a more robust genome-wide normalization?

      * Several studies have shown that in the conditions that we studied here: transfer of seedlings from high R/FR (simulated sun) to low R/FR (neighbor proximity), amongst all PIFs, PIF7 is the one that plays the most dominant function (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). PIF4 and PIF5 also contribute but to a lesser extent. Given that Willige et al., 2021 did extensive ChIP-seq studies for PIF7 using similar conditions to the ones we used, we decided to rely on their data (that we re-analyzed), rather than performing our own PIF7 ChIP-seq analysis. While also performing a ChIP-seq analysis for PIF4 in similar conditions might be useful (this data is not available as far as we know), we are not convinced that doing that experiment would substantially modify the message. In the revised version we will also include analysis of the data from Pfeiffer et al., 2014, which comprises a ChIP-seq. dataset for PIF5 (the closest paralog of PIF4) initially performed by Hornitschek et al., in 2012 in low R/FR conditions (see comment to reviewer 1 above). For new ChIP-seq, we would have to make this experiment from scratch with substantially more material than what we used for the targeted ChIP-qPCR analyses. We thus do not feel that such an investment (time and money) is warranted.

        • Given the known functional interaction between PIF7 and INO80, it would be relevant to test whether changes in chromatin accessibility at ATHB2 and other genes are affected in ino80 mutant seedlings. __We agree with the reviewer that this is potentially an interesting experiment. This will allow us to determine whether the nucleosome histone composition has an influence on nucleosome positioning at selected shade-regulated genes (e.g. ATHB2). We note that according to available data, the effect of INO80 would be expected once PIF7 started transcribing shade-induced genes. We therefore propose comparing the WT with an ino80 mutant for their seedling growth phenotype, expression of selected shade marker gene (e.g. ATHB2*) and chromatin accessibility before (high R/FR) and after low R/FR treatment at selected shade marker genes. This will allow us to determine whether INO80 influences chromatin accessibility prior to a low R/FR treatment and/or once the treatment started. Our plan is to include this data in a revised version of the manuscript. __
      • On the same line, it would be interesting to test whether PIF7 target regions with pre-existing accessible chromatin would exist in ino80 mutant plants. In other words, testing a model in which chromatin remodeling by INO80 defines accessibility under HRFR to enable rapid PIF recruitment and DNA binding upon LRFR exposure.*

      See our answer just above.

      Minor comments

      *• In Figure 1C, it seems that PIF7 target genes do not match the set of LRFR-downregulated genes (even less than at random). Why not exclude these 4 genes from the analyses? *

      This is correct. There are indeed only 4 downregulated PIF7 target genes as we define them. Removing these genes from the analyses does not change our interpretation of the data and hence for completeness we propose keeping them in a revised version of the manuscript

      • Figure 3A shows the quantification of protein blots, but I did not find the corresponding images. These should be shown in the figure or as a supplementary figure with proper controls.

      * We will include the raw Westen blots used for quantification of PIF4, PIF5 and PIF7 in the revised version of the manuscript

        • Lane 102, it is unclear why PIF7 target genes were defined as the -3kb/TSS domains while Arabidopsis intergenic regions are on average much shorter. Gene regulatory regions, or promoters, are typically called within -1kb/TSS regions to avoid annotating a ChIP peak to the upstream gene or TE. A better proxy of PIF7 typical binding sites in gene regulatory regions could be determined by analysing the mean distance between PIF7 peak coordinates and the closest TSS. Typically, a gene meta-plot would give this information. __We agree that the majority of PIF7 binding peaks are close to the 5’ of the TSS based on the PIF7 binding distribution meta-plot. But several known PIF binding sites are actually further upstream than 1kb 5’ of the TSS (e.g. ATHB2 and HFR1). However, we re-analyzed the data using your suggestion with -2kb/TSS and -1kb/TSS and while the number of target genes is reduced, it does not change our conclusions about PIF7 binding sites being located on accessible chromatin regions. Importantly, some well characterized LRFR induced genes such as HFR1* would not be annotated correctly if only peaks closest to the gene TSS were taken into account, without flanking genes. In this case only the neighboring AT1G02350 would be annotated, hence missing some important PIF7 target genes. Taking this into consideration we will not modify this part of the analysis in a revised manuscript.__
      • Figure 4B, what's represented in the ATAC-seq heatmap: does a positive z-score represent high accessibility?*

      On the ATAC-seq heatmap we have represented z-scores of the average CPM (counts per million) for accessible chromatin regions. Z-scores are calculated by subtracting the average CPM from the median of averaged CPMs for each accessible chromatin region and then divided by the standard deviation (SD) of those averaged CPMs across all groups per accessible region (in our case a group is an average of three biological replicates for either HRFR, 1h or 25h of LRFR). In that sense, z-score indicates a change in accessibility, where higher z-score indicates opening of the region and lower z-score indicates a region becoming more closed when compared among the three light treatments (HRFR, 1h or 25h of LRFR). We will make sure that this is clear in the revised manuscript. Reviewer #2 (Significance (Required)):

      Contradicting the naive hypothesis that PIFs may target shade-inducible genes to « open » chromatin of shade-inducible genes with the help of chromatin remodelers, such as INO80, the study highlights that PIF7 typically associates with pre-existing accessible chromatin states. Thus, even though this is not stated, results from this study indicate that PIF7 is not a pioneer transcription factor. The data seem very robust, and while some conclusions need clarification, it should be of great interest to the community of scientists studying plant light signaling and shade responses.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Paulisic et al. investigate whether the transcriptional response of Arabidopsis seedlings to shade depends on chromatin accessibility, with a specific focus on PIF7-regulated genes. To this end, they perform ATAC-seq and RNA-seq, along with other experiments, on seedlings exposed to short and long shade and correlate the results with previously reported PIF7 and PIF4 ChIP-seq data. Based on their findings, they propose that shade-mediated transcriptional regulation may not require extensive remodeling of DNA accessibility. Specifically, they suggest that the open chromatin conformation allows PIFs to easily access and recognize their binding motifs, rapidly initiating gene expression in response to shade. This transcriptional response primarily depends on a transient increase in PIF stability and gene occupancy, with changes in chromatin accessibility occurring in only a small number of genes.

      Major comments: * • I have one issue that, in my opinion, requires more attention. To define the PIF7 target genes, which were later used to estimate whether PIF7 binds to open or closed chromatin and affects DNA accessibility after its binding, the authors compared the 4h LRFR data point from Willige et al. (2021) ChIP-seq with their 1h RNA-seq data point. This comparison might have missed early genes where PIF7 binds before the 1h time point but is no longer present on DNA at 4h. I understand the decision to choose the 4h Willige et al. ChIP-seq data point, performed under LD conditions, as it matches the data in this study, rather than the 5min-30min data points, which were conducted in constant light. However, if possible, it would be interesting to also compare the RNA-seq data with the early PIF7 binding genes to assess how many additional PIF7 target genes could be identified based on that comparison and whether this might alter the conclusions. If the authors do not agree with this point, it should at least be emphasized that the ChIP-seq data and the RNA-seq/ATAC-seq data were performed under different LRFR conditions (R/FR 0.6 vs. 0.1), which may lead to the misidentification of PIF7 target genes in the manuscript.*

      1) This is an interesting suggestion, we therefore reanalyzed 5, 10 and 30 min ChIP-seq timepoints from Willige et al, 2021 and compared them to 4h of LRFR (ZT4). We have crossed these lists of potential PIF7 targets with our 1h LRFR PIF457 dependent genes based on our RNA-seq. While some PIF7 targets appear only in early time points 5-10 min of LRFR exposure, overall, the number and composition of PIF7 target genes is rather constant across these timepoints. We propose to include these additional analyses in a revised version of the manuscript as a supplemental figure. However, these additional analyses do not influence our general conclusions.

      2) The comment regarding the R/FR ratio is important. We will point this out although the conditions used by Willige et al., 2021 and the ones we used are similar, they are not exactly the same in terms of R/FR ratio. Importantly, in both studies the early transcriptional response largely depends on the same PIFs, many of the same response genes are induced (e.g. PIL1, AtHB2, HFR1, YUC8, YUC9 and many others) and the physiological response (hypocotyl elongation) is similar. This shows that this low R/FR response yields robust responses.

      Minor comments: • In Fig. 1D, please describe the meaning of the blue shaded areas and the blue lines under the ATAC-seq peaks, as they do not always correlate.

      The shaded areas and the bars define the extension of the ATAC-seq accessible chromatin peaks. We will add the meaning of the shaded areas and the blue bars in the Figure legend and correct the colors in a revised manuscript

      • In Fig. 1E, it could be helpful to note that the 257 peaks in the right bar correspond to the peaks associated with the 177 genes in the left bar.* We will update Figure 1E and Figure legends for better understanding as the Reviewer suggested.

      • In lines 116, 119, and 122, I believe it should read "Fig. 2" instead of "Fig. 2A."* We thank the Reviewer for noticing the error that we will correct.

      • Lines 138-139: "PIF7 total protein levels were overall more stable, and only a mild and non-significant increase of PIF7 levels was seen at 1 h of LRFR." Since PIF7 usually appears as two bands in HRFR and only one band in LRFR, how was the protein level of PIF7 quantified in Fig. 3A? Additionally, I was wondering about the authors' thoughts on the discrepancy with Willige et al. (2021, Extended Data Fig. 1d), where PIF7 abundance seems to increased after 30 min and 2 h of LRFR.* PIF7 protein levels were quantified by considering both the upper and the lower band in HRFR (total PIF7) and normalizing its levels to DET3 loading control. We still observe an increase in the total PIF7 protein levels at 1h of LRFR, however this change was not statistically significant in these experiments. In our conditions as in Willige et al, 2021, the increase in PIF7 protein levels to short term shade seems consistent as is the pronounced shift or disappearance of the upper band (phosphorylated form) on the Western blots (raw data will be available in the revised manuscript). We will introduce text changes referring to the phosphorylation status of PIF7 in our conditions.

      • Line 150: "... many early PIF target genes (Figure 3C)." Since only PIL1 is shown in Fig. 3C, I would recommend revising this sentence. Alternatively, the data could be presented, as in Fig. 2, for all the PIF7 target genes with transient expression patterns.

      * We will introduce changes in the text to reflect that we only show PIL1 in the main Figure 3C.

      • Line 204: I'm not sure if Supplementary Fig. 7C-D is correct here. If it is, could the order of the figures be changed so that Supplementary Fig. 7C-D becomes Supplementary Fig. 7A-B?*

      The order of the panels A-B in the Supplementary Figure 7 follows the order of the text in the manuscript and is mentioned before panels C-D. It refers to the sentence “Overexpression of phyB resulted in a strong repression of hypocotyl elongation in both HRFR and LRFR, while the absence of phyB promoted hypocotyl elongation (Supplementary Figure 7A-B).”

        • Line 208: "In all three cases...". Please clarify what the three cases refer to. __We will change the text to more explicitly refer to the differentially accessible regions (DARs) of the genes ATHB2 and HFR1* shown in Figure 5A.__
      • Line 231: Should Fig. 5C also be cited here in addition to Supplementary Fig. 7?* We will add the reference to Figure 5C that was missing.

      *• In Supplementary Table 3, more information is needed. For example, it could mention: "This data is presented in Fig. 3 and is based on datasets from ChIP-seq, RNA-seq, etc."

      *

      The table will be updated with more information as suggested by the Reviewer.

      • In the figure legend of Fig. 4B, please check the use of "( )".*

      We will correct the error and include the references inside the parenthesis.

      Reviewer #3 (Significance (Required)):

      Paulisic et al. present novel discoveries in the field of light signaling and shade avoidance. Their findings extend our understanding of how DNA organization, prior to shade, affects PIF binding and how PIF binding remodels DNA accessibility. The data presented support the conclusions well and are backed by sufficient experimental evidence.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The manuscript has not been modified yet.

      3. Description of analyses that authors prefer not to carry out

      • *

      Reviewer 2 asked for new ChIP-seq analyses for PIF7 and PIF4. For reasons that we outlined above, we believe that such analyses are not required, and we currently do not intend performing these experiments.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments:

      I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text.

      I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.

      In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences.

      I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      Minor comments:

      Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.

      Significance

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      My fields of expertise are photobiology, photosynthesis and early seedling development.

    1. Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example) but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.<br /> One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.

      Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

    2. Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the *right* way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but *not* the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the non-normalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But, assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693 and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to *disinformation*, rather than simply _less information._ I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore *not* Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary: 

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths: 

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work and for their comments, which we believe have been instrumental in significantly improving our work and its scope. Below, we address all their concerns.

      Weaknesses: 

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery. 

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model. 

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; 5) intensity sensitivity; 6) subliminal accumulation. Here, we are following the same terminology employed in Eckert et al., Current Biology 34, 5646–5658 (2024), the paper highlighted by the reviewer. We have dedicated a section of the revised version of the manuscript to these hallmarks, substantiating the validity of our framework as a minimal model to have habituation. We remark that these are the sole hallmarks that can be discussed by considering one single external stimulus and that can be identified without ambiguity in a biochemical context. This observation is again in line with Eckert et al., Current Biology 34, 5646–5658 (2024).

      In the revised version, we employ the same strategy of the aforementioned work to determine when the system can be considered “habituated”. Indeed, we introduce a response threshold that is now discussed in the manuscript. We also included a note in the discussions stating that, since any biochemical model will eventually reach a steady state, subliminal accumulation, for example, can only be seen with the use of a threshold. The introduction of different storage mechanisms, ideally more detailed at a molecular level, can shed light on this conceptual gap. This is an interesting direction of research.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed? 

      The reviewer is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes in the previous version. In the revised version, in the section discussing the hallmarks of habituation, we also show other parameter choices when the response decrement is more pronounced. Moreover, we remark that the contour plot of \Delta⟨U> clearly shows that the decrement can largely exceed the 20% threshold presented in the previous version.

      In the revised version, also in light of the works highlighted by the reviewer, we decided to move the focus of the manuscript to the information-theoretic advantage of habituation. As such, we modified several parts of the main text. Also, in the region of optimal information gain, habituation is at an intermediate level. For this reason, we decided to keep the same parameter choice as the previous version in Figure 2.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as done in Eckert et al., Current Biology 34, 5646–5658 (2024), we can state that the system is habituated after a few stimuli for each set of parameters. This aspect is highlighted in the revised version of the manuscript (see also the point above).

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above. 

      As for the response decrement of the readout, we can certainly choose a set of parameters for which the information gain is higher. In the revised version, we also report the information at the first stimulation and when the system is habituated to give a better idea of the range of these quantities. At any rate, as the referee correctly points out, it is difficult to give an intuitive interpretation of the information in our minimal model.

      It is also important to remark that, since the readout population and the receptor both undergo fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus. As such, the mutual information presents a discontinuous behavior that resembles the dynamics of the readout, thereby starting at a non-zero value already at the first stimulus.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. In the revised version, we highlighted that we discuss the information-theoretic aspects of habituation, while the aforementioned references focus on the dynamics of this phenomenon.

      Reviewer #1 (Recommendations for the authors):

      I would also like to note here the simplification of the proposed biological model - in particular, that the receptor can be in an active/passive state, as well as proposing the Nf-kB signaling module as a possible molecular realization. Generally, a large number of cell surface receptors including RTKs of GPCRs have much more complex dynamics including autocatalytic activation that generally leads to bistability, and the Nf-kB has been demonstrated to have oscillatory even chaotic dynamics (works of Savas Tsay, Mogens Jensen and others). Considering this, the authors should at least discuss under which conditions these TNF-Alpha signaling could potentially serve as a molecular realisation for habituation. 

      We thank the reviewer for bringing this to our attention. In the previous version, we reported the TNF signaling network only to show a similar coarse-grained modular structure. However, following a suggestion of reviewer #2, we decided to change Figure 1 to include a simplified molecular scheme of chemotaxis rather than TNF signaling, to avoid any source of confusion about this issue.

      Also, a minor point: Figures 2d-e are cited before 2a-c. 

      We apologize for the oversight. The structure of the Figures and their order is now significantly different, and they are now cited in the correct order. 

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation. 

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained: 

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is a delicate aspect to discuss and we thank the referee for the comment. In the revised version, we report information gain, initial and final information, highlighting that both gain and final information are higher in regions where habituation is present. They have qualitatively similar behavior and highlight a clear information-theoretic advantage of this dynamical phenomenon. An important point is that, to determine the optimal Pareto front, we consider a prolonged stimulus and its associated steady-state information. Therefore, from the optimization point of view, there is no notion of “information gain” or “final information”, which are intrinsically dynamical quantities. As a result, the fact that optimal curve lies in the region of optimal information gain is a-priori not expected and hints at the potential crucial role of this feature. In the revised version, we elucidate this aspect with several additional analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain (non-zero) mutual information, multiple observations of the same stimulus have to reflect into accumulated information that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid confusion between the usual definition of (perfect) adaptation and habituation. However, we now believe that this is not the case for the revised manuscript, and we now include chemotaxis as an example in Figure 1.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the reviewer for the suggestion. We agree that a priori, there is no reason to choose \delta Q_R or a function of the internal energy flux J_int (that, in the revised version, we are using in place of \dot\Sigma_int following the suggestion of reviewer #3). The rationale was to minimize \delta Q_R since this dissipation is unavoidable and stems from the presence of the storage inhibiting the receptor through the internal pathway. Indeed, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R.

      In the revised version, we now include in the optimization principle two energy contributions (see Eq. (14) of the revised manuscript): \delta Q_R and E_int, which is the energy consumption associated with the driven storage production per unit energy. All Figures have been updated accordingly. The results remain similar, as \delta Q_R still represents the main contribution, especially at high \beta.

      Furthermore, in the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the signal needs to be strong enough for the system to distinguish it from the intrinsic thermal noise (controlled by beta). We also show that if the system is able to tune the inhibition strength \kappa, the Pareto frontiers at different ⟨H⟩ collapse into a single curve. This shows that, although the values of, e.g., the mutual information, depend on ⟨H⟩, the qualitative behavior of the system in this regime is effectively independent of it. We also added more details about this in the Supplementary Information.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels? 

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, we believe that the fact that our minimal model is able to capture the features of a complex neural system just by looking at the PCs, without any explicit biological details, is non-trivial. We also stress that the 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. In the case of the data generated from the model, most of the variance of the activity comes from the switching signal, and similar considerations can be made for the looming stimulations in the data. We updated the manuscript to clarify this point.

      Reviewer #2 (Recommendations for the authors):

      (1) The abstract makes it sound like a new finding is that habituation is due to a slow, negative feedback mechanism. But, as mentioned in the introduction, this is a well-known fact. 

      We agree with the reviewer. We have revised the abstract.

      (2) Figure 2c Why does the range of Delta Delta I_f include negative values if the corresponding region is shaded (right-tilted stripes)? 

      The negative values in the range are those attained in the shaded region with right-tilted stripes. We decided to include them in the colorbar for clarity, since Delta Delta I_f is also plotted in the region where it attains negative values.

      (3) What does the Pareto front look like if the optimization is done for input statistics given by ⟨H⟩_min? 

      In the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the strength of the signal is crucial for the system to discriminate input and thermal noise (see also the answers above).

      In particular, in Figure 4 we explicitly compare the results of the Pareto optimization (which is done with a static input of a given statistics) with the dynamics of the model for different values of ⟨H⟩ in two scenarios, i.e., adaptive and non-adaptive inhibition strength (see answers above for details).

      We also remark that ⟨H⟩_min represents the background signal that the system is not trying to capture, which is why we never used it for optimization.

      (4) From the main text, it is rather difficult to understand how the comparison to the experimental data was performed. How was the PCA done exactly? What are the "features" of the evoked neural response? 

      The PCA on data is performed starting from the single-neuron calcium dynamics. To perform a far comparison, we reconstruct a similar but extremely simplified dynamics using our model as explained in Methods to perform the PCA on analogous simulated data. We added a comment on this in the revised version. While these components capture most of the variance in the data, their specific interpretation is usually out of reach and we believe that it lies beyond the scope of this theoretical work. We also remark that the model does not contain all these biological details - a strong aspect in our opinion - and, as such, it cannot capture specific biological features.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment. 

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination. 

      We thank the reviewer for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed: 

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the reviewer for raising this point. In the revised version, we have changed the abstract to reflect the reviewer’s points and the new structure and results of the manuscript.

      (2) Several clarifications are needed on the treatment of energy dissipation. 

      -   When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the reviewer for this typo. Indeed, \sigma sets the energy scale of feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., in Eq. (1) together with \kappa. This typo has been corrected in the revised manuscript, and all subsequent equations are consistent.

      -   I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on ⟨H⟩, however, is not fully clear. If the environment were static and the memory block was absent, the term with ⟨H⟩ would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence.

      By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript), since its presence is solely due to the existence of a storage population. Therefore, in this case, the receptor would be a 2-state, 1-pathway system and, as such, it would always satisfy an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript would not hold anymore and the receptor would not exhibit any dissipation. Thus, in a static environment and without a memory block, no receptor dissipation would be present. We would also like to stress that our choice to model two different pathways has been motivated by the observation that the negative feedback acts along a different pathway in several biochemical and biological examples. We made some changes to the model description in the revised version and we hope that this aspect has been clarified.

      -   Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate? 

      We agree with the referee that the reverse reaction we considered is not the microscopic reverse of the storage production. In the case of a fast readout population, we employed a coarse-grained view to compute this entropy production. To be more precise, we gladly welcomed the referee’s suggestion in the revised version and modified the manuscript accordingly. As suggested, we now employ the energy flux associated with the storage production to estimate the internal dissipation (see new Fig. 3). 

      In the revised version, we also use this quantity in the optimization procedure in combination with \deltaQ_R (see new Fig. 4) to have a complete characterization of the system’s energy consumption. The conclusions are qualitatively identical to before, but we believe that now they are more solid from a theoretical perspective. For this important advance in the robustness and quality of our work, we are profoundly grateful to the referee.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics? 

      The initial stimulus is indeed stochastic with an average constant in time and mimics the background (small) signal. We apply the (strong) stimulation when the system already reached a stationary state with respect to the background. As it can be appreciated in Fig. 2 of the revised version, the model response depends on the pre-stimulus level, since it sets the storage concentration before the stimulation arrives and, as such, the subsequent habituation dynamics. This dependence is important from a dynamical perspective. The information-theoretic picture has been developed, as said above, by letting the system relax before the first stimulus. This eliminates this arbitrary dependence and provides a clearer idea of the functional advantages of habituation. Moreover, the optimization procedure is performed in a completely different setting, with no pre-stimulus at all, since we only have one prolonged stimulation. We hope that the revised version is clearer on all these points.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity and we thank the reviewer for spotting this issue. In Figure 4 (now Figure 5 in the revised manuscript) Δ⟨S⟩ is not exactly zero, but equal to 0.15% at the final point. It appeared as 0% in the plot due to an unwanted rounding in the plotting function that we missed. This has been fixed in the revised version, thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 2 | "Figure 1b-e" should be "Figure 1b-d" since there is no panel (e) in Figure 1. 

      (2) Figure 1a | In the top schematic, the symbol "k" is used, while in the rest of the text, the proportionality constant is denoted by κ. 

      We thank the reviewer for pointing this out. Figure 1 has been revised and the panels are now consistent. The proportionality constant (the inhibition strength) has also been fixed.

      (3) Figure 1a | I find the upper part of the schematic for Storage hard to perceive. I understand the lower part stands for the degradation reaction for storage molecules. The upper part stands for the synthesis reaction catalyzed by the readout population. I think the bolded upper arrow would explain it sufficiently well; the left/right arrows, together with the crossed green circle make that part of the figure confusing. Consider simplifying. 

      We decided to remove the left/right arrows, as suggested by the reviewer, as we agree that they were unnecessarily complicating the schematic. We hope that the revised version will be easier to understand.

      (4)Page 3 | It would be helpful to tell what the temporal statistics of the input signal $p_H(h,t)$ is, i.e. <h(t) h(t')>. Looking at the example trajectory in Figure 1a, consecutive signal values do not seem correlated. 

      We agree with the reviewer that this is an important detail and worth mentioning. We now explicitly state that consecutive values are not correlated, for simplicity. 

      (5)Figure 2 | I believe the label "EXTERNAL INPUT" refers to the *average* external input, not one specific realization (similar to panels (d) and (e) that report on average metrics). I suggest you indicate this in the label, or, what may be even better, add one particular realization of the stochastic input to the same graph.

      We thank the reviewer for spotting this. We now write that what we show is the average external signal. We prefer this solution rather than showing a realization of the stochastic input, since it is more consistent with the rest of the plots, where we always show average quantities. We also note that Figure 2 is now Figure 3 in the revised manuscript.

      (6)Figure 2d | The expression of Δ⟨U⟩ is the negative of the definition in Eq. (5). It should be corrected. 

      In the revised version, both the definitions in Figure 2 (now Figure 3) and in the text (now Eq. (11)) are consistent.

      (7) Figure 3(d-e) caption | "where ⟨U⟩ starts to be significantly smaller than zero." There, it should be Δ⟨U⟩ instead of ⟨U⟩. 

      Thanks again, we corrected this typo.

    1. Now, there are a million implications to outsourcing our first drafts to AI. We know people anchor on the first idea they see, influencing their future work, so even drafts that are completely rewritten will be AI-tinged. People may not be as thoughtful about what they write, or the lack of effort may mean they don’t think through problems as deeply.

      The starting point can no longer be a draft, must be a conversation?

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to the public reviews:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the “RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments and suggestions for revisions

      (1) The authors do not provide strong evidence that the interactions of the 'W' of the RVxF- øø -R-W string with the hydrophobic groove of PP1 is conserved in PIPs. Whereas the RVxF motif is well conserved and validated since its discovery in 1997, as are the øø - (an extension of the RVxF motif), and the 'R', the conservation of the Trp residue in the RVxF-øø-R-W string is not conserved.

      We did not mean to imply that the W motif is conserved amongst all PIPs.

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs). Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through a conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed that the PPP1R15A/B, Neurabin/Spinophilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs share a trajectory across the PP1 surface that encompasses not only the RVxF-ΦΦ-R SLIMs, but also additional sequences C-terminal to the R SLIM (Chen et al, eLife, 2015). This trajectory is also shared by the Phactr1-PP1 complex (Fedoryshchak et al, eLife, 2020). Based on this structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134 (See Fedoryshchak et al, 2020, Figure 1 figure supplement 2).

      Introduction, paragraph 2 is rewritten to make this clearer.

      The sequence and positions of W differ in amino acid type and position relative to the RVxF-øø-R string.

      The motif ‘W’ does not mandate tryptophan, it is our name for a common structurally aligned motif: although the Phactrs and PPP1R15A/B indeed have W at this position, Neurabin and spinophilin contain VDP, which nevertheless makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In the Discussion the authors state that the hydrophobic groove of PP1 is remodelled by Neurabin. However, details of this are not described or shown in the manuscript.

      The shared trajectory determined by the RVxF-øø-R-W string brings the sequences C-terminal to the W SLIM into the vicinity of the PP1 hydrophobic groove. In the Phactr1/PP1 holoenzyme this generates a novel pocket required for substrate recognition (Fedoryshchak et al, 2020). These observations raised the possibility that sequences C-terminal to the “W” motif in the other RVxF-øø-R-W PIPs also play a role in substrate recognition.

      Introduction paragraph 3 now cites a new Figure 1-S2, which shows how the hydrophobic groove is remodelled in the various different PIP/PP1 complexes. A revised Figure 1A now indicates the hydrophobic residues defining the hydrophobic groove by grey shading.

      (2) To add to the confidence of the structure, the authors should include a 2Fo-Fc simulated annealing omit map, perhaps showing the R and W interactions of the RVxF-øø-R-W string.

      This is now included as new Figure 6 Figure supplement 1. Note that in Neurabin, the W motif is VDP, where the valine and proline sidechains interact similarly to the tryptophan (see also new Figure 1-S2G,H).

      We also add a new supplementary Figure 6-S1 comparing our PBM-liganded Neurabin PDZ domain with the previously published unliganded structure (Ragusa et al 2010).

      (3) Page 16. The authors state that spinophilin remodels the PP1 hydrophobic groove differently from Phactrs. Arguably spinophilin does not remodel the PP1 hydrophobic groove at all. There are no contacts between spinophilin and the PP1 hydrophobic groove in the spinophilin-PP1 structure, correlating with the absence of 'W" in the RVxF-øø-R-W string in spinophilin.

      The VDP sequence corresponding to the W motif in spinophilin and neurabin makes analogous contacts to those made by the W in Phactr1 (see Fedoryshchak et al 2020).

      Remodelling is meant in the sense of altering the structure of the major groove by bringing new sequences into its vicinity rather than necessarily directly interacting with it. The spinophilin/PP1 and Phactr/PP1 hydrophobic grooves are compared in new Figure 1-S2 (see also Fedoryshchak et al 2020, Figure 2 figure supplement 1)

      (4) Page 8. For the cell-based/proteomics-dephosphorylation assay in Figure 2, it isn't clear why there were no dephosphorylation sites detected for the PPP1R15A/B-PP1 fusion (except PPP6R1 S531 for PPP1R15B). One might have expected a correlation with PP1 alone. Does this imply that PPP1R15A/B are inhibiting PP1 catalytic activity? Was the activity tested in vitro?

      The R15A/B data are compared to average abundance of all the phosphosites in the dataset, including those of PP1.

      We have not tested for a general inhibitory effect of R15A/B on PP1 activity. Many PIPs including R15A/B do occlude one or more of the PP1 substrate groove and therefore generally act as inhibitors of PP1 activity against some potential substrates, while enhancing activities against others.

      Other points 

      (4) Figure S1: Colour sequence similarities/identities.

      Done

      (6) Figures: Structure figures lacked labels:

      Figure 1A, label PP1, Phactrs etc.

      Done

      Figure 6, label PP1, Neurabin, previous Neurabin structure (Fig. 6C), hydrophobic groove, PDZ domain, etc.

      Done

      (7) Statistical analysis. p values should be shown for data in:

      Figure 5.

      To avoid cluttering the Figure, a new sheet, “statistical significance” has been added to Supplementary Table 3, summarizing the analysis.

      Figure 1.

      Figure amended (now figure 1-S1).

      (8) Some inconsistency with labels, eg '34-WT' used in Fig. 5C, whereas '34A-WT' (better) in Methods.

      Now changed to 34A etc where used.

      (9) Page 6. PPP1R9A/B is not shown in Figure 1A and Figure S1A.

      PPP1R9A/B are Neurabin and spinophilin - now clarified in Introduction paragraph 2, Results paragraph 1, Discussion paragraph 1.

      (10) Page 7: lines 4, 'site' not 'side'.

      Done

      (11) Page 9: DTL and CAMSAP3 were found to be dephosphorylated in the PP1-Neurabin/spinophilin screen. Are these PDZ-binding proteins?

      Neither DTL nor CAMSAP3 contain C-terminal hydrophobic residues characteristic of classical PBMs. Sentence added in Discussion, paragraph 5

      (12) Page 12 and Figure 5 and S5: The synthetic p4E-BP1 and IRSp53WT peptides with PBM should be given more specific names to indicate the presence of the PBM.

      We have renamed 4E-BP1<sup>WT</sup> and IRSp53<sup>WT</sup> to 4E-BP1<sup>PBM</sup> and  IRSp53<sup>PBM</sup> respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides.

      Text, Figure 5, and Figure S5 all revised accordingly.

      (13) Give PDB code for spinophilin-PP1 complex coordinates shown in Figure 6C.

      PDB codes for the various PIP/PP1 complexes now given in new Figure 1-S2 and revised Figure 6C.

      Reviewer #2 (Recommendations for the authors):

      The work undertaken by the authors is extensive and robust, however, I believe that some improvement in the writing and some detailed explanation of certain results sections would help with the presentation of the work and clarity for the readers.

      (1) The introduction should contain more information about the interaction between PP1 and Neurabin, given that this is the focus of the paper. This would give the reader the necessary background required to follow the paper.

      Introduction paragraph 2 revised to describe the different SLIMs in more detail. New Figure 1-S2 shows detail of the different remodelled hydrophobic grooves in the various PIP/PP1 complexes.

      (2) More information on PP1-IRSp53L460A has to be added before discussing results in S1B.

      Sentence explaining that IRSp53 L460 docks with the remodelled PP1 hydrophobic groove in the Phactr1/PP1 holoenzyme added in Results paragraph 2.

      (3) Page 6: "as expected, the +5 residue L460A mutation, which impairs dephosphorylation by the intact Phactr1/PP1 holoenzyme, impaired sensitivity to all the fusions, indicating that they recognise phosphorylated IRSp53 in a similar way (Figure S1B)". Statistics between IRSp53 and IRSp53L460A across PP1-PIPs need to be conducted before concluding the above. From the graph and the images, the impairment to dephosphorylation is not convincing.

      For each of the four PP1-Phactr fusions, the IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide (p<0.05 for each fusion).

      Since the proteomics studes in Figure 2 show that the substrate specificity of the four PP1-Phactr1 fusions is virtually identical, we combined the data for the four different fusions. The IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide in this analysis (p< 0.0001). This result shown in revised Figure S1B and legend.

      (4) mCherry-4E-BP1(118+A), in which an additional C-terminal alanine should still allow TOSmediated phosphorylation, but prevent PDZ interaction. Does 4EBP1 (118+A) actually prevent interaction between PP1-Neurabin? This interaction needs to be validated, especially since spinophilin was shown to bind to multiple regions of PP1.

      It is not clear what the referee is asking for here. The biochemical analysis in Figure 4C shows that the C-terminus of 4E-BP1 constitutes a classical PBM. The X-ray crystallography in Figure 6 confirms this, demonstrating H-bond interactions between the 4E-BP1 C-terminal carboxylate and main chain amides of L514, G515 and I516.

      We consider the possibility that the 4E-BP1(118+A) mutant inhibits the activity of PP1-neurabin via a mechanism other than direct blocking 4E-BP1 / PDZ interaction to be unlikely for the following reasons:

      (1) Addition of a C-terminal alanine will disrupt the PBM interaction because the extra residue sterically blocks access to the PBM-binding groove. This is the most parsimonious explanation, and is based on our solid structural and biochemical evidence that the 4E-BP1 C-terminus is a classical PBM.

      (2) Alphafold3 modelling predicts Neurabin PDZ / 4E-BP1 PBM interaction with high confidence (shown in Figure 6-S2E), but it does not predict any PDZ interaction with 4E-BP1(118+A). Note added in Figure 6-S2 legend.

      (3) Recognition of the 4E-BP1(118+A) mutation without loss of binding affinity would require that the mutant becapable of binding formally equivalent to recognition of an “internal” PDZ-binding peptide. Recognition of such “internal peptides” is dependent on their adopting a specifically constrained conformation, which typically requires reorganisation of the PDZ carboxylate-binding GLGF loop. Such “internal site” recognition typically involves more than one residue C-terminal to the conventional PDZ “0” position (see Penkert et al NSMB 2004, doi:10.1038/nsmb839; Gee et al JBC 1998, DOI: 10.1074/jbc.273.34.21980; Hillier et al 1999, Science PMID: 10221915).

      (5) It is nice to see that the various PP1-Phactr fusions have around 60% substrate overlap between them. Would it be possible to compare these results with previously published mass spec data of Phactr1XXX from the group? There is mention of some substrates being picked up, but a comparison much like in Figure 2E would be more informative about the extent to which the described method captures relevant information.

      This is difficult to do directly as the PP1-Phactr fusion data are from human cells while that in Fedoryshchak et al 2020 is from mouse.

      However, manual curation shows that of the 28 top hits seen in our previous analysis of Phactr1XXX in NIH3T3 cells, 18 were also detectable in the HEK293 system; of these, 13 were also detected as as PP1-Phactr fusion hits. Data summarised in new Figure 2-S1C. Text amended in Results, “Proteomic analysis...”, paragraph 2.

      (6) Figure 3D Why are the levels of pT70, pT37/46 and total protein in vector controls much lower as compared to 0nM Tet in PP1-Neurabin conditions? It is also weird that given total protein is so low, why are the pS65/101 levels high compared to the rest?

      We think it likely these phenomena reflect a low level expression of PP1-Neurabin expression in uninduced cells. Now noted in Figure 3D legend, basal PP1-Neurabin expression shown in new Figure 3-S1C. This alters the relative levels of the different species detected by the total 4E-BP1 antibody in favour of the faster migrating forms, which are less phosphorylated than the slower ones, and the total amount increases about 2-fold (Figure 3D, compare 0nM Tet lanes).

      The altered p65/101-pT70 ratio is also likely to reflect the leaky PP1-Neurabin expression, since the relative intensities of the various phosphorylated species are dependent on both the relative rates of phosphorylation and dephosphorylation. Expression of a phosphatase would therefore be expected to differentially affect the phosphorlyation levels of different sites according to their reactivity.

      (7) Figure 3E: Does inhibiting mTORC further reduce translation when PP1-Neurabin is expressed? If this is the case, this might suggest that they might not necessarily be mTORC inhibitors?

      We have not done this experiment. Since Rapamycin cannot be guaranteed to completely block 4E-BP1 phosphorylation, and PP1-Neurabin cannot be guaranteed to completely dephosphorylate 4E-BP1, any further reduction upon their combination would be hard to interpret.

      (8) Substrate interactions with the remodelled PP1 hydrophobic groove do not affect PP1-Neurabin specificity. Is there evidence that PP1-Neurabin remodels the hydrophobic groove? Is it not possible that Neurabin does not remodel the PP1 groove to begin with and hence there is no effect observed with the various mutants? If this is not the case, it should be explained in a bit more detail.

      Comparison of the Neurabin/PP1 and Phactr1/PP1 structures shows that the hydrophobic groove is remodelled differently in the two complexes. Now shown in new Figure 1-S2B,C,G.

      (9) Figure 5B has a lot of interesting information, which I believe has not been discussed at all in the results section.

      To help interpretation of the enzymology in Figure 5 we have renamed 4E-BP1WT and IRSp53WT to 4E-BP1PBM and IRSp53PBM respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides. Text in Results, “PDZ domain interaction…”, paragraph 1, and Figures 5 and S5 revised accordingly.

      Why does the 4E-BP1Mut affect catalytic efficiency of PP1 alone when compared with WT, while no difference is observed with IRSp53WT and mutant?

      We do not understand the basis for the differential reactivity of 4E-BP1PBM and 4E-BP1MUT with PP1 alone; we suspect that it reflects the hydrophobicity change resulting from the MDI -> SGS substitution. However this is unlikely to be biologically significant as PP1 is sequestered in PIP-PP1 complexes.

      Importantly, the two PP1 fusion proteins behave consistently in this assay – the presence of the intact PBM increases reactivity with PP1-Neurabin, but has no effect on dephosphorylation by PP1-Phactr1.

      Why does PP1 alone not have a difference between IRSp53WT and mutant, while PP1-Neurabin does have a difference?

      This is due to the presence of the PBM in IRSp53WT (now renamed IRSp53PBM), which affects increases affinity for PP1 Neurabin, but not PP1 alone. Likewise, PP1-Phactr1, which does not possess a PDZ domain, is also unaffected by the integrity of the PBM.

      (7) “Strikingly, alanine substitutions at +1 and +2 in 4E-BP1WT increased catalytic efficiency by both fusions, perhaps reflecting changes at the catalytic site itself (Figure 5E, Figure S5E)”. This could be expanded upon, because this suggests a mechanism that makes the substrate refractory to PDZ/hydrophobic groove remodelling?

      We favour the idea that this reflects a requirement to balance dephosphorylation rates between the multiple 4E-BP1 phosphorylation sites, especially if multiple rounds of dephosphorylation occur for each PBM—PDZ interaction. Additional sentences added in Discussion paragraph 7.

      (8) Typographical errors and minor comments:

      a) PIPs can target PP1 to specific subcellular locations, and control substrate specificity through autonomous substrate-binding domains, occupation or extension of the substrate grooves, or modification of PP1 surface electrostatics.

      b) Phosphophorylation side site abundances within triplicate samples from the same cell line were comparable between replicates (Figure 2B).

      c) While the alanine substitutions had little effect, conversion of +4 to +6 to the IRSp534E-BP1 sequence LLD increased catalytic efficiency some 20-fold (Figure 5C, Figure S5C). 

      d) Figure 3E labels are not clear. The graph can be widened to make the labels of the conditions clearer.

      All corrected

      Reviewer #3 (Recommendations for the authors):

      This was a very well-written manuscript.

      However, I was looking for a summary mechanistic figure or cartoon to help me navigate the results.

      I noted a few typos in the text.

      New summary Figure 5-S2 added, cited in results, and discussed in Discussion paragraph 6,7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article presents a meta-analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      We thank the Reviewer for their positive comments.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      We thank the Reviewer for their positive comments.

      Weaknesses:

      While the dataset employed in this research holds promise, a rigorous justification of the core assumptions underpinning the analytical framework is inadequate. The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      We thank the Reviewer for these comments. We agree that more justification and transparency is needed of the core assumptions that form the foundation of our methods. In our revised version, we have taken the following steps to achieve this:

      - Altered the title to be more explicit about the core assumptions, which now reads: “Local-scale relative abundance is decoupled from global range size”

      - We have added more details on why and how we treat global range size as a measure of ‘occupancy.’

      - We have added a section that discusses the limitations of using eBird relative abundance

      Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      We thank the Reviewer for these comments.

      Strengths:

      The study raises a dormant question, with a large dataset.

      We thank the Reviewer for these comments. We intended to take a longstanding question and attempt to apply novel datasets that were not available mere decades ago. While we do not imply that we have ‘solved’ the question, we hope this work highlights the potential for further interrogation using these large datasets.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      We agree that there is a heterogeneous collection of data across many habitats, taxa, and observations. However, rather than as a weakness, we see this as a significant strength. Our work assumes we are averaging over this variability to assess for a large-scale pattern in the relationship - something that was potentially a limitation of previous work, as these large datasets were often focused on particular contexts (e.g., much work focused solely on the UK), which we believe could limit some of the generalizability of the previous work. However, the reviewer makes a fair point in regard to the heterogeneity of data collection. We have now added some text in the discussion which is explicit about this - see the new section named “Potential limitations of current work and future work –-although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, synthesizing observations of potentially heterogeneous locations, context and quality”.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      See comment below.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      See comment below.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      See comment below.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      See comment below. We also added a sentence in the methods that highlighted we did not remove alien ranges and provided reasons why. Still, we do acknowledge the dramatic changes in populations and environments over the past 50 years (see the new section  “Potential limitations of current work and futur work”)

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      We agree that all of the points above are interesting data explorations. As said above, our main purpose was to highlight the potential for further interrogation using these large datasets. However, we have added some additional text in the discussion that explicitly mentions/encourages these additional data explorations. We hope people will pick up on the potential for these data and explore them further.

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.

      We agree that if we were interested in the absolute abundance of a given species, the local number on an eBird checklist would be a poor representation. However, our study aims not to estimate absolute abundance but to examine relative abundance among species on each checklist. By focusing on relative abundance, we leverage eBird data's strengths in detecting the presence and frequency of species across diverse locations and times, thereby capturing community composition trends that can provide meaningful insights despite individual checklist biases. This approach allows us to assess the comparative prominence of species in the community as reported by the observer, providing a consistent metric of relative abundance. Despite detectability biases, the structure of eBird checklists reflects the observer’s encounter rates with each species under similar conditions, offering a valuable snapshot of relative species composition across sites and times. The key to our assumption is that these biases discussed are not directional and, therefore, random throughout the sampling process, which would translate to no ‘real’ bias in our effect size of interest.

      Range biases are also present. Notably, tropical mountain-occupying species have range sizes overestimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al., Nature Communications). These species are often quite rare, too.

      We thanks the reviewer for pointing to this issue and reference. We included a discussion on these biases in our limitations section and reference Ocampo-Peñuela et al. to emphasize the need for improved spatial resolution in range data for more accurate AOR assessments.”More precise range-size estimates would also improve the accuracy of AOR assessments, since species range data are often overestimated due to the failure to capture gaps in actual distributions ”

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      We agree random errors can affect estimates, but as we wrote above, random errors, regardless of magnitudes, would not bias estimates. After accounting for sampling error (a part of random errors), little variance is left to be explained as we have shown in the MS. This suggests that many of the random errors were part of the sampling errors. And this is where meta-analysis really shines.

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Extremely high correlations would not make any biological sense if these observations were based on large sample sizes. However, as shown in Figure 2, all extreme correlations come from small sample sizes (i.e., low precision), as sampling theory expects (actually our Fig 2 a text-book example of the funnel shape). Therefore, we do not need to invoke any biological explanations here.

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      We agree. Now, we extended an explanation of why confirmation bias could result in positive AOR. Yet, we point out confirmation bias is a very common phenomena which we cite relevant citations in the original MS. The only way to avoid confirmation bias is to conduct a study blind but this is not often possible in ecological work.

      “Meta-research on behavioural ecology identified 79 studies on nestmate recognition, 23 of which were conducted blind. Non-blind studies confirmed a hypothesis of no aggression towards nestmates nearly three times more often. It is possible that confirmation bias was at play in earlier AOR studies.”

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

      We agree. And yes, we pointed out these in our introduction.

      Reviewer #3 (Public Review):

      Summary:

      This paper claims to overturn the longstanding abundance occupancy relationship.

      Strengths:

      (1) The above would be important if true.

      (2) The dataset is large.

      We have clarified this point by changing the title to emphasize that we do not suggest overturning AORs entirely but instead provide a refined view of the relationship at a global scale. Our results suggest a weaker and more context-dependent AOR than previously documented. We hope our revised title and additional clarifications in the text convey our intent to contribute to a more nuanced understanding rather than a whole overturning of the AOR framework.

      Weaknesses:

      (1) The authors are not really measuring the abundance-occupancy relationship (AOR). They are measuring abundance-range size. The AOR typically measures patches in a metapopulation, i.e. at a local scale. Range size is not an interchangeable notion with local occupancy.

      We have refined this in our revision to be more explicitly focused on global range size. However, we note that the classic paper by Bock and Richlefs (1983, Am Nat) also refers to global (species entire) range size in the context of the AOR. Importantly, Bock and Richlefs pointed out the importance of using species’ entire ranges; without such uses, there will be sampling artifacts creating positive AORs when using arbitrary geographical ranges, which were used in some studies of AORs. So we highlight that our work is well in line with the previous work, allowing us to question the longstanding macroecological work. One of the issues of AOR has been how to define occupancy and global range size, which provides a relatively ambiguous measure, which is why we used this measure.

      (2) Ebird is a poor dataset for this. The sampling unit is non-standard. So abundance can at best be estimated by controlling for sampling effort. Comparisons across space are also likely to be highly heterogenous. They also threw out checklists in which abundances were too high to be estimated (reported as "X"). As evidence of the biases in using eBird for this pattern, the North American Breeding Bird Survey, a very similar taxonomic and geographic scope but with a consistent sampling protocol across space does show clear support for the AOR.

      Yes, we agree the sampling unit is non-standard. However, this is a significant strength in that it samples across much heterogeneity (as discussed in response to Reviewer 2, above). We were interested in relative abundance and not direct absolute abundance per se, which is accurate, especially since we did control for sampling effort.

      We appreciate the reviewer’s attention to our data selection criteria. We excluded checklists containing ‘X’ entries to minimize biases in our abundance estimates. The 'X' notation is often used for the most common species, reflecting the observer's identification of presence without specifying a count. This approach was chosen to avoid disproportionately inflating presence data for these abundant species, which could distort the relative abundance calculations in our analysis. By excluding such checklists, we aimed to retain consistency and ensure that local abundance estimates were representative across all species on each checklist. We have revised our manuscript to clarify this methodological choice and hope this explanation addresses the reviewer’s concern. We modified our text in the methods to make the entries ‘X’ clearer (see the Method section).

      (3) In general, I wonder if a pattern demonstrated in thousands of data sets can be overturned by findings in one data set. It may be a big dataset but any biases in the dataset are repeated across all of those observations.

      Overturning a major conclusion requires careful work. This paper did not rise to this level.

      We appreciate the reviewer’s caution regarding broad conclusions based on a single dataset, even one as large as eBird. Our intention was not to definitively overturn the abundance-occupancy relationship (AOR) but to re-evaluate it with the most extensive and globally representative dataset currently available. We recognise that potential biases in citizen science data, such as observer variation, may influence our findings, and we have taken steps to address these in our methodology and limitations sections. We see this work as a contribution to an ongoing discourse, suggesting that AOR may be less universally consistent than previously believed, mainly when tested with large-scale citizen science data. We hope this study will encourage additional research that tests AORs using other expansive datasets and approaches, further refining our understanding of this classic macroecological relationship. However, we have left our broad message about instigating credible revolution and also re-examining ecological laws.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The investigation focuses solely on interspecific relationships among birds; thus, the extrapolation of these conclusions to broader ecological contexts requires further validation.

      We have now added this point to our new section: “Although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, so we hope this work serves as a foundation for further investigations that utilize such comprehensive datasets.”

      (2) The rationale for combining data from eBird - a platform predominantly representing individual observations from urban North America - with the more globally comprehensive BirdLife International database needs to be substantiated. The potential underrepresentation of global abundance in the eBird checklist data could introduce a sampling bias, undermining the foundational premises of AORs.

      We agree with the limitation of ebird sampling coverage, but it should not bias our results. In statistical definitions, bias is directional, and if not directional, it will become statistical noise, making it difficult to detect the signal. In fact, our meta-analyses adjust what statisticians call sampling bias and it is the strength of meta-analysis.

      (3) In the full mixed-effect model, checklist duration and sampling variance (inversely proportional to sample size N) are treated as fixed effects. However, these variables are likely to be negatively correlated, which could introduce multicollinearity, inflating standard errors and diminishing the statistical significance of other factors, such as the intercept. This calls into question the interpretation of insignificance in the results.

      Multicollinearity is an issue with sample sizes. For example, with small datasets, correlations of 0.5 could be an issue, and such an issue would usually show up as a large SE. We do not have such an issue with ~ 17 million data points. Please refer to this paper.

      Freckleton, Robert P. "Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error." Behavioral Ecology and Sociobiology 65 (2011): 91-101.

      (4) The observed low heterogeneity may stem from discrepancies in sampling for abundance versus occupancy, compounded by uncertainties in reporting behavior.

      If we assume everybody underreports common species or overreports rare species, this could happen. However, such an assumption is unlikely. If some people report accurately (but not others), we should see high heterogeneity, which we do not observe).  We have touched upon this point in our original MS.

      (5) The contribution and implementation of phylogenetic comparative analysis remain ambiguous and were not sufficiently clarified within the study.

      We need to add more explanation for the global abundance analysis

      “To statistically test whether there was an effect of abundance and occupancy at the macro-scale, we used phylogenetic comparative analysis.  This analysis also addresses the issue of positive interspecific AORs potentially arising from not accounting for phylogenetic relatedness among species examined ”

      (6) The use of large N checklists could skew the perceived rarity or commonality of species, potentially diminishing the positive correlation observed in AORs. A consistent observer effect could lead to a near-zero effect with high precision.

      Regardless of the number of N species in checklists (seen in Fig 2), correlations are distributed around zero. This means there is nothing special about large N checklists. 

      (7) The study should acknowledge and discuss any discrepancies or deviations from previous literature or expected outcomes.

      We felt we had already done this as we discussed the previous meta-analysis and what we expected from this meta-analysis.  Nevertheless, we have added some relevant sentences in the new version of MS.

      In addition to these major points, there are several minor concerns:

      (1) Figure 2B lacks discussion, and the metric for the number of observations is not clarified. Furthermore, the labeling of the y-axis appears to be incorrect.

      Thank you very much for pointing out this shortcoming. Now, the y-axis label has been fixed and we mention 2B in the main text.

      (2) The study should provide a clear, mathematical expression of the multilevel random effect models for greater transparency.

      Many thanks for this point, and now we have added relevant mathematical expressions in Table S6.

      (3) On Line 260, the term "number of species" should be refined to "number of species in a checklist," ideally represented by a formula for precision.

      This ambiguity has been mended as suggested.

      Please provide the data and R code linked to the outputs.

      The referee must have missed the link (https://github.com/itchyshin/AORs) in our original MS. In addition to our GitHub repository link, we now have added a link to our Zenodo repository (https://doi.org/10.5281/zenodo.14019900).

      Reviewer #3 (Recommendations For The Authors):

      The authors cite Rabinowitz's 7 forms of rarity paper as a suggestion that previous findings also break the AOR. In fact empirical studies of the 7 forms of rarity typically find that all three forms of rareness vs commonness are heavily correlated (e.g. Yu & Dobson 2000).

      We thank the reviewer for drawing attention to Yu & Dobson (2000) and similar studies that find positive correlations among the axes of rarity. Ref 3 is correct in that Rabinowitz’s (1981) framework does not require that local abundance and geographic range size be uncorrelated for every species; instead, it highlights conceptual scenarios where a species may be common locally yet have a restricted distribution (or vice versa).

      Empirical analyses such as Yu & Dobson (2000) show that, on average, these axes can be correlated, which may align with conventional AOR findings in some taxonomic groups. However, Rabinowitz’s key insight was that exceptions do occur, so these exceptions demonstrate that strong positive AORs may not be universally applicable. Our results do not claim that Rabinowitz’s framework “breaks” the AOR outright; instead, we use it to underscore that local abundance can, in principle, be “decoupled” from global occupancy.  Whether the correlation found by Yu & Dobson (2000) implies a positive AOR, requires a detailed simulation study, which is an interesting avenue for future research. 

      Thus, citing Rabinowitz serves to highlight the potential heterogeneity and complexity of abundance–occupancy relationships rather than to refute every positive correlation reported in the literature. Our findings suggest that when examined at large spatiotemporal scales (with unbiased sampling), the overall AOR signal may be less robust than traditionally believed. This is consistent with Rabinowitz’s view that local abundance and global range can vary along independent axes. Now we added

      “Although studies using her framework found positive correlations between species range and local abundance.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examined the changes in ATL GABA levels induced by cTBS and its relationship with BOLD signal changes and performance in a semantic task. The findings suggest that the increase in ATL GABA levels induced by cTBS is associated with a decrease in BOLD signal. The relationship between ATL GABA levels and semantic task performance is nonlinear, and more specifically, the authors propose that the relationship is an inverted U-shaped relationship.

      Strengths:

      The findings of the research regarding the increase of GABA and decrease of BOLD caused by cTBS, as well as the correlation between the two, appear to be reliable. This should be valuable for understanding the biological effects of cTBS.

      Weakness:

      I am pleased to see the authors' feedback on my previous questions and suggestions, and I believe the additional data analysis they have added is helpful. Here are my reserved concerns and newly discovered issues.

      (1) Regarding the Inverted U-Shaped Curve In the revised manuscript, the authors have accepted some of my suggestions and conducted further analysis, which is now presented in Figure 3B. These results provide partial support for the authors' hypothesis. However, I still believe that the data from this study hardly convincingly support an inverted U-shaped distribution relationship.

      The authors stated in their response, "it is challenging to determine the optimal level of ATL GABA," but I think this is achievable. From Figures 4C and 4D, the ATL GABA levels corresponding to the peak of the inverted U-shaped curve fall between 85 and 90. In my understanding, this can be considered as the optimal level of ATL GABA estimated based on the existing data and the inverted U-shaped curve relationship. However, in the latter half of the inverted U-shaped curve, there are quite few data points, and such a small number of data points hardly provides reliable support for the quantitative relationship in the latter half of the curve. I suggest that the authors should at least explicitly acknowledge this and be cautious in drawing conclusions. I also suggest that the authors consider fitting the data with more types of non-linear relationships, such as a ceiling effect (a combination of a slope and a horizontal line), or a logarithmic curve.

      We appreciate R1’s comments. Inverted U-shaped relationships are well-established in neuroscience, particularly in the context of neurotransmitter concentrations (e.g., dopamine, acetylcholine, noradrenaline) and their influence on cognitive functions such as working memory and cognitive control (Aston-Jones & Cohen., 2005; Cools & D'Esposito., 2011; Vijayraghavan et al., 2007; He & Zempel., 2013). Recently, Ferri et al. (2017) demonstrated an inverted U-shaped relationship between excitation-inhibition balance (EIB: the ratio of Glx and GABA) and multisensory integration, showing that both excessive and insufficient inhibition negatively impact functionality. Given that GABA is the brain’s primary inhibitory neurotransmitter, our findings suggest that ATL GABA may play a similar regulatory role in semantic memory function.

      While our statistical modelling approach demonstrated that the inverted U-shaped function was the best-fitting model for our current data in explaining the relationship between ATL GABA and semantic memory, we acknowledge the limitation of having fewer data points in the latter half (right side) of the curve, where excessive ATL GABA levels are associated with poorer semantic performance. Following R1’s suggestion, we have explicitly acknowledged this limitation in the revised manuscript and exercised caution in our discussion.

      Discussion, p.17, line 408

      "However, our findings should be interpreted with caution due to the limitation of having fewer data points in the latter half (right side) of the inverted U-shaped curve. Future studies incorporating GABA agonists could help further validate and refine these findings."

      Following R1’s latter suggestion, we tested a logarithmic curve model. The results showed significant relationships between ATL GABA and semantic performance (R<sup>2</sup> = 0.544, p < 0.001) and between cTBS-induced changes in ATL GABA and semantic performance (R<sup>2</sup> = 0.202, p < 0.001). However, the quadratic (inverted U-shaped) model explained more variance than the logarithmic model, as indicated by a higher R<sup>2</sup> and lower BIC. Model comparisons further confirmed that the inverted U-shaped model provided the best fit for both ATL GABA in relation to semantic performance (Fig. 4C) and cTBS-induced ATL GABA changes in relation to semantic function (Fig. 4D).

      Author response table 1.

      (2) In Figure 2F, the authors demonstrated a strong practice effect in this study, which to some extent offsets the decrease in behavioral performance caused by cTBS. Therefore, I recommend that the authors give sufficient consideration to the practice effect in the data analysis.

      One issue is the impact of the practice effect on the classification of responders and non-responders. Currently, most participants are classified as non-responders, suggesting that the majority of the population may not respond to the cTBS used in this study. This greatly challenges the generalizability of the experimental conclusions. However, the emergence of so many non-responders is likely due to the prominent practice effect, which offsets part of the experimental effect. If the practice effect is excluded, the number of responders may increase. The authors might estimate the practice effect based on the vertex simulation condition and reclassify participants after excluding the influence of the practice effect.

      Another issue is that considering the significant practice effect, the analysis in Figure 4D, which mixes pre- and post-test data, may not be reliable.

      We appreciate Reviewer 1’s thoughtful comments regarding the practice effect and its potential impact on our findings. Our previous analysis revealed a strong practice effect on reaction time (RT), with participants performing tasks faster in the POST session, regardless of task condition (Fig. S3). Given our hypothesis that inhibitory ATL cTBS would disrupt semantic task performance, we accounted for this by using inverse efficiency (IE), which combines accuracy and RT. This analysis demonstrated that ATL cTBS disrupted semantic task performance compared to both control stimulation (vertex) and control tasks, despite the practice effect (i.e., faster RT in the POST session), thereby supporting our hypothesis. These findings may suggest that the effects of ATL cTBS were more subtly reflected in semantic task accuracy rather than RT.

      Regarding inter-individual variability in response to rTMS/TBS, prior studies have shown that 50–70% of participants are non-responders, either do not respond or respond in an unexpected manner (Goldsworthy et al., 2014; Hamada et al., 2013; Hinder et al., 2014; Lopez-Alonso et al., 2014; Maeda et al., 2000a; Müller-Dahlhaus et al., 2008). Our previous study (Jung et al., 2022) using the same semantic task and cTBS protocol was the first to explore TBS-responsiveness variability in semantic memory, where 12 out of 20 participants (60%) were classified as responders. The proportion of responders and non-responders in the current study aligns with previous findings, suggesting that this variability is expected in TBS research.

      However, we acknowledge R1’s concern that the strong practice effect may have influenced responder classification. To address this, we estimated the practice effect using the vertex stimulation condition and reclassified participants accordingly by adjusting ATL stimulation performance (IE) relative to vertex stimulation performance (IE). This reclassification identified nine responders (an increase of two), aligning with the typical responder proportion (52%) reported in the TBS literature. Overall, we replicated the previous findings with improved statistical robustness.

      A 2×2×2 ANOVA was conducted with task (semantic vs. control) and session (PRE vs. POST) as within-subject factors, and group (responders vs. non-responders) as a between-subject factor. The analysis revealed a significant interaction between the session and group (F<sub>1, 15</sub> = 10.367, p = 0.006), a marginally significant interaction between the session and task (F<sub>1, 15</sub> = 4.370, p = 0.054), and a significant 3-way interaction between the session, task, and group (F<sub>1, 15</sub> = 7.580, p = 0.015). Post hoc t-tests showed a significant group difference in semantic task performance following ATL stimulation (t = 2.349, p = 0.033). Post hoc paired t-test demonstrated that responders exhibited poorer semantic task performance following the ATL cTBS (t = -5.281, p < 0.001), whereas non-responders showed a significant improvement (t = 3.206, p = 0.007) (see Figure. 3A).

      Notably, no differences were observed between responders and non-responders in the control task performance across pre- and post-stimulation sessions, confirming that the practice effect was successfully controlled (Figure. 3B).

      We performed a 2 x 2 ANOVA with session (pre vs. post) as a within subject factor and with group (responders vs. non-responders) as a between subject factor to examine the effects of group in ATL GABA levels. The results revealed a significant main effect of session (F<sub>1, 14</sub> = 39.906, p < 0.001) and group (F<sub>1, 14</sub> = 9.677, p = 0.008). Post hoc paired t-tests on ATL GABA levels showed a significant increase in regional ATL GABA levels following ATL stimulation for both responders (t = -3.885, p = 0.002) and non-responders (t = -4.831, p = 0.001). Furthermore, we replicated our previous finding that baseline GABA levels were significantly higher in responders compared to non-responders (t = 2.816, p = 0.007) (Figure. 3C). This pattern persisted in the post-stimulation session (t = 2.555, p = 0.011) (Figure. 3C).

      Accordingly, we have revised the Methods and Materials (p 26, line 619), Results (p11, line 233-261), and Figure 3.

      (3) The analysis in Figure 3A has a double dipping issue. Suppose we generate 100 pairs of random numbers as pre- and post-test scores, and then group the data based on whether the scores decrease or increase; the pre-test scores of the group with decreased scores will have a very high probability of being higher than those of the group with increased scores. Therefore, the findings in Figure 3A seem to be meaningless.

      Yes, we agreed with R1’s comments. However, Figure 3A illustrates interindividual responsiveness patterns, while Figure 3B demonstrates that these results account for practice effects, incorporating new analyses.

      (4) The authors use IE as a behavioral measure in some analyses and use accuracy in others. I recommend that the authors adopt a consistent behavioral measure.

      We appreciate Reviewer 1’s suggestion. In examining the relationship between ATL GABA and semantic task performance, we have found that only semantic accuracy—not reaction time (RT) or inverse efficiency (IE)—shows a significant positive correlation and regression with ATL GABA levels and semantic task-induced ATL activation, both in our previous study (Jung et al., 2017) and in the current study. ATL GABA levels were not correlated with semantic RT (Jung et al., 2017: r = 0.34, p = 0.14, current study: r = 0.26, p = 0.14). It should be noted that there were no significant correlations between ATL GABA levels and semantic inverse efficiency (IE) in both studies (Jung et al., 2017: r = 0.13, p = 0.62, current study: r = 0.22, p = 0.44). As a result, we found no significant linear and non-linear relationship between ATL GABA levels and RT (linear function R<sup>2</sup> = 0.21, p =0.45, quadratic function: R<sup>2</sup> = 0.17, p = 0.21) and between ATL GABA levels and IE (linear function R<sup>2</sup> = 0.24, p =0.07, quadratic function: R<sup>2</sup> = 2.24, p = 0.12).

      The absence of a meaningful relationship between ATL GABA and semantic RT or IE may be due to the following reasons: 1) RT is primarily associated with premotor and motor activation during semantic processing rather than ATL activation; 2) ATL GABA is likely to play a key role in refining distributed semantic representations through lateral inhibition, which sharpens the activated representation (Jung et al., 2017; Liu et al. 2011; Isaacson & Scanziani., 2011). This sharpening process may contribute to more accurate semantic performance (Jung et al., 2017). In our semantic task, for example, when encountering a camel (Fig. 1B), multiple semantic features (e.g., animal, brown, desert, sand, etc.) are activated. To correctly identify the most relevant concept (cactus), irrelevant associations (tree) must be suppressed—a process that likely relies on inhibitory mechanisms. Given this theoretical framework, we have used accuracy as the primary measure of semantic performance to elucidate the ATL GABA function.

      Reviewer #2 (Public review):

      Summary:

      The authors combined inhibitory neurostimulation (continuous theta-burst stimulation, cTBS) with subsequent MRI measurements to investigate the impact of inhibition of the left anterior temporal lobe (ATL) on task-related activity and performance during a semantic task and link stimulation-induced changes to the neurochemical level by including MR spectroscopy (MRS). cTBS effects in the ATL were compared with a control site in the vertex. The authors found that relative to stimulation of the vertex, cTBS significantly increased the local GABA concentration in the ATL. cTBS also decreased task-related semantic activity in the ATL and potentially delayed semantic task performance by hindering a practice effect from pre to post. Finally, pooled data with their previous MRS study suggest an inverted u-shape between GABA concentration and behavioral performance. These results help to better understand the neuromodulatory effects of non-invasive brain stimulation on task performance.

      Strengths:

      Multimodal assessment of neurostimulation effects on the behavioral, neurochemical, and neural levels. In particular, the link between GABA modulation and behavior is timely and potentially interesting.

      Weaknesses:

      The analyses are not sound. Some of the effects are very weak and not all conclusions are supported by the data since some of the comparisons are not justified. There is some redundancy with a previous paper by the same authors, so the novelty and contribution to the field are overall limited. A network approach might help here.

      Reviewer #3 (Public review):

      Summary:

      The authors used cTBS TMS, magnetic resonance spectroscopy (MRS), and functional magnetic resonance imaging (fMRI) as the main methods of investigation. Their data show that cTBS modulates GABA concentration and task-dependent BOLD in the ATL, whereby greater GABA increase following ATL cTBS showed greater reductions in BOLD changes in ATL. This effect was also reflected in the performance of the behavioural task response times, which did not subsume to practice effects after AL cTBS as opposed to the associated control site and control task. This is in line with their first hypothesis. The data further indicates that regional GABA concentrations in the ATL play a crucial role in semantic memory because individuals with higher (but not excessive) GABA concentrations in the ATLs performed better on the semantic task. This is in line with their second prediction. Finally, the authors conducted additional analyses to explore the mechanistic link between ATL inhibitory GABAergic action and semantic task performance. They show that this link is best captured by an inverted U-shaped function as a result of a quadratic linear regression model. Fitting this model to their data indicates that increasing GABA levels led to better task performance as long as they were not excessively low or excessively high. This was first tested as a relationship between GABA levels in the ATL and semantic task performance; then the same analyses were performed on the pre and post-cTBS TMS stimulation data, showing the same pattern. These results are in line with the conclusions of the authors.

      Comments on revisions:

      The authors have comprehensively addressed my comments from the first round of review, and I consider most of their answers and the steps they have taken satisfactorily. Their insights prompted me to reflect further on my own knowledge and thinking regarding the ATL function.

      I do, however, have an additional and hopefully constructive comment regarding the point made about the study focusing on the left instead of bilateral ATL. I appreciate the methodological complexities and the pragmatic reasons underlying this decision. Nevertheless, briefly incorporating the justification for this decision into the manuscript would have been beneficial for clarity and completeness. The presented argument follows an interesting logic; however, despite strong previous evidence supporting it, the approach remains based on an assumption. Given that the authors now provide the group-level fMRI results captured more comprehensively in Supplementary Figure 2, where the bilateral pattern of fMRI activation can be observed in the current data, the authors could have strengthened their argument by asserting that the activation related to the given semantic association task in this data was bilateral. This would imply that the TMS effects and associated changes in GABA should be similar for both sites. Furthermore, it is worth noting the approach taken by Pobric et al. (2007, PNAS), who stimulated a site located 10 mm posterior to the tip of the left temporal pole along the middle temporal gyrus (MTG) and not the bilateral ATL.

      We appreciate the reviewer’s constructive comment regarding the focus on the left ATL rather than bilateral ATL in our study. Accordingly, we have added the following paragraph in the Supplementary Information.

      “Justification of target site selection and cTBS effects

      Evidence suggests that bilateral ATL systems contribute to semantic representation (for a review, see Lambon Ralph., 2017). Consistent with this, our semantic task induced bilateral ATL activation (Fig. S2). Thus, stimulating both left and right ATL could provide a more comprehensive understanding of cTBS effects and its GABAergic function.

      Previous rTMS studies have applied inhibitory stimulation to the left vs. right ATL, demonstrating that stimulation at either site significantly disrupted semantic task performance (Pobric et al., 2007, PNAS; Pobric et al., 2010, Neuropsychologia; Lambon Ralph et al., 2009, Cerebral Cortex). Importantly, these studies reported no significant difference in rTMS effects between left and right ATL stimulation, suggesting that stimulating either hemisphere produces comparable effects on semantic processing. In the current study, we combined cTBS with multimodal imaging to investigate its effects on the ATL. Given our study design constraints (including the need for a control site, control task, and control stimulation) and limitations in scanning time, we selected the left ATL as the target region. This choice also aligned with the MRS voxel placement used in our previous study (Jung et al., 2017), allowing us to combine datasets and further investigate GABAergic function in the ATL. Accordingly, cTBS was applied to the peak coordinate of the left ventromedial ATL (MNI -36, -15, -30) as identified by previous fMRI studies (Binney et al., 2010; Visser et al., 2012).

      Given that TMS pulses typically penetrate 2–4 cm, we acknowledge the challenge of reaching deeper ventromedial ATL regions. However, our findings indicate that cTBS effectively modulated ATL function, as evidenced by reduced task-induced regional activity, increased ATL GABA concentrations, and poorer semantic performance, confirming that TMS pulses successfully influenced the target region. To further validate these effects, we conducted an ROI analysis centred on the ventromedial ATL (MNI -36, -15, -30), which revealed a significant reduction in ATL activity during semantic processing following ATL stimulation (t = -2.43, p = 0.014) (Fig. S7). This confirms that cTBS successfully modulated ATL activity at the intended target coordinate.”

      We appreciate R3's comment regarding the approach taken by Pobric et al. (2007, PNAS), who stimulated a site 10 mm posterior to the tip of the left temporal pole along the middle temporal gyrus (MTG). This approach has been explicitly discussed in our previous papers and reviews (e.g., Lambon Ralph, 2014, Proc. Royal Society B). Our earlier use of lateral ATL stimulation at this location (Pobric et al. 2007; Lambon Ralph et al. 2009; Pobric et al. 2010) was based on its alignment with the broader ATL region commonly atrophied in semantic dementia (cf. Binney et al., 2010 for a direct comparison of SD atrophy, fMRI data and the TMS region). Since these original ATL TMS investigations, a series of distortion-corrected or distortion-avoiding fMRI studies (e.g., Binney et al 2010; Visser et al, various, Hoffman et al., various; Jackson et al., 2015) have demonstrated graded activation differences across the ATL. While weaker activation is present at the original lateral ATL (MTG) stimulation site, the peak activation is maximal in the ventromedial ATL—a finding that was also observed in the current study. Accordingly, we selected the ventromedial ATL as our target site for stimulation.

      Following these points, we have revised the manuscript in the Methods and Materials.

      Transcranial magnetic stimulation p23, line 525-532,

      “Previous rTMS studies targeted a lateral ATL site 10 mm posterior to the temporal pole on the middle temporal gyrus (MTG) (Pobric et al. 2007; Lambon Ralph et al. 2009; Pobric et al. 2010), aligning with the broader ATL region typically atrophied in semantic dementia  (Binney et al. 2010). However, distortion-corrected fMRI studies (Binney et al. 2010; Visser et al. 2012) have revealed graded activation differences across the ATL, with peak activation in the ventromedial ATL. Based on these findings, we selected the target site in the left ATL (MNI -36, -15, -30) from a prior distortion-corrected fMRI study (Binney et al. 2010; Visser et al. 2012 that employed the same tasks as our study (for further details, see the Supplementary Information).”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have responded to all my comments and I found most of the responses reasonable and sufficient. However, I have one remaining point: I pointed out before that the scope of this paper is somehow narrow and asked for a network analysis. I found the response to my question somehow puzzling since the authors write:

      "However, it is important to note that we did not find any significant correlations between ATL GABA changes and cTBS-induced changes in the functional connectivity. Consequently, we are currently preparing another paper that specifically addresses the network-level changes induced by ATL cTBS."

      I don't understand the logic here. Even in the absence of significant correlations between ATL GABA changes and cTBS-induced changes in connectivity, it would be interesting to know how baseline connectivity is correlated with the induced changes. I am not sure if it is adequate to squeeze another paper out of the dataset instead of reporting it here as suggested.

      We apologise that our previous response was not clear. To examine cTBS-induced network-level changes, we conducted ROI analyses targeting key semantic regions, including the bilateral ATL, inferior frontal gyrus (IFG), and posterior middle temporal gyrus (pMTG), as well as Psychophysiological Interactions (PPI) using the left ATL as a seed region. The ROI analysis revealed that ATL stimulation significantly decreased task-induced activity in the left ATL (target region) while increasing activity in the right ATL and left IFG. PPI analyses showed that ATL stimulation enhanced connectivity between the left ATL and the right ATL (both ventromedial and lateral ATL), bilateral IFG, and bilateral pMTG, suggesting that ATL stimulation modulates a bilateral semantic network.

      Building on these findings, we conducted Dynamic Causal Modeling (DCM) to estimate and infer interactions among predefined brain regions across different experimental conditions (Friston et al., 2003). The bilateral ventromedial ATL, lateral ATL, IFG, and pMTG were defined as network nodes with mutual connections. Our model examined cTBS effects at the left ATL under both baseline (intrinsic) and semantic task (modulatory) conditions, estimating 56 intrinsic parameters for baseline connectivity and testing 16 different modulatory models to assess cTBS-induced connectivity changes during semantic processing. Here, we briefly summarize the key DCM analysis results: 1) ATL cTBS significantly altered effective connectivity between the left and right lateral and ventromedial ATL in both intrinsic and modulatory conditions; 2) cTBS increased modulatory connectivity from the right to the left ATL compared to vertex stimulation.

      Given the complexity and depth of these findings, we believe that a dedicated paper focusing on the network-level effects of ATL cTBS is necessary to provide a more comprehensive and detailed analysis, which extends beyond the scope of the current study. It should be noted that no significant relationship was found between ATL GABA levels and ATL connectivity in both PPI and DCM analyses.

      Reviewer #3 (Recommendations for the authors):

      In response to my comment about the ATL activation being rather medial in the fMRI data and my concern about the TMS pulse perhaps not reaching this site, the authors offer an excellent solution to demonstrate TMS effects to such a medial ATL coordinate. I think that the analyses and figures they provide as a response to this comment and a brief explanation of this result should be incorporated into supplementary materials for methodologically oriented readers. Also, perhaps it would be beneficial to discuss that the effect of TMS on vATL remains a matter of further research to see not just if but also how TMS pulse reaches target coordinates, given the problematic anatomical location of the region.

      We appreciate R3’s suggestion. Please, see our reply above.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the utility of spatial image correlation spectroscopy (ICS) for measuring physiological responses to DNA damage. ICS is a long-established (~1993) method similar to fluorescence correlation spectroscopy, for deriving information about the fluorophore density that underlies the intensity distributions of images. The authors first provide a technical but fairly accessible background to the theory of ICS, then compare it with traditional spot-counting methods for its ability to analyze the characteristics of γH2AX staining. Based on the degree of aggregation (DA) value, the authors then survey other markers of DNA damage and uncover some novel findings, such as that RPA aggregation inversely tracks the sensitivity to PARP inhibitors of different cell lines.

      The need for a more objective and standardized tool for analyzing DNA damage has long been felt in the field and the authors argue convincingly for this. The data in the manuscript are in general well-supported and of high quality, and show promise of being a robust alternative to traditional focus counting. However, there are a number of areas where I would suggest further controls and explanations to strengthen the authors' case for the robustness of their ICS method.

      Strengths:

      The spatial ICS method the authors describe and demonstrate is easy to perform and applicable to a wide variety of images. The DDR was well-chosen as an arena to showcase its utility due to its well-characterized dose-responsiveness and known variability between cell types. Their method should be readily useable by any cell biologist wanting to assess the degree of aggregation of fluorescent tags of interest.

      Weaknesses:

      The spatial ICS method, though of longstanding history, is not as intuitive or well-known as spot-based quantitation. While the Theory section gives a standard mathematical introduction, it is not as accessible as it could be. Additionally, the values of TNoP and DA shown in the Results are not discussed sufficiently with regard to their physical and physiological interpretation.

      We agree that a major limitation in adaption of this approach is a deeper understanding of the theory and results. We have updated the theory section to include further discussion (Page 4 line 132)

      The correlation of TNoP with γH2AX foci is high (Figure 2) and suggestive that the ICS method is suitable for measuring the strength of the DDR. The authors correctly mention that the number of spots found using traditional means can vary based on the parameters used for spot detection. They contrast this with their ICS detection method; however, the actual robustness of spatial ICS is not given equal consideration.

      We found it difficult to give equal consideration of robustness to ICS. The major limitation of traditional approaches is proper selection of an intensity threshold that is necessary to define and separate foci from background intensity. However, ICS does not employ a threshold, therefore we could not test different thresholding applications in ICS as we did with traditional methods. In our view the absence of the need for a threshold is profoundly advantageous. The only inputs we employ in the ICS analysis are used to segment cell nuclei, yet these have no impact on the ICS calculation and are necessary for any analysis of the DDR.

      Reviewer #2 (Public review):

      Summary:

      Immunostaining of chromatin-associated proteins and visualization of these factors through fluorescence microscopy is a powerful technique to study molecular processes such as DNA damage and repair, their timing, and their genetic dependencies. Nonetheless, it is well-established that this methodology (sometimes called "foci-ology") is subject to biases introduced during sample preparation, immunostaining, foci visualization, and scoring. This manuscript addresses several of the shortcomings associated with immunostaining by using image correlation spectroscopy (ICS) to quantify the recruitment of several DNA damage response-associated proteins following various types of DNA damage.

      The study compares automated foci counting and fluorescence intensity to image correlation spectroscopy degree of aggregation study the recruitment of DNA repair proteins to chromatin following DNA damage. After validating image correlation spectroscopy as a reliable method to visualize the recruitment of γH2AX to chromatin following DNA damage in two separate cell lines, the study demonstrates that this new method can also be used to quantify RPA1 and Rad51 recruitment to chromatin following DNA damage. The study further shows that RPA1 signal as measured by this method correlates with cell sensitivity to Olaparib, a widely-used PARP inhibitor.

      Strengths:

      Multiple proof-of-concept experiments demonstrate that using image correlation spectroscopy degree of aggregation is typically more sensitive than foci counting or foci intensity as a measure of recruitment of a protein of interest to a site of DNA damage. The sensitivity of the SKOV3 and OVCA429 cell lines to MMS and the PARP inhibitors Olaparib and Veliparib as measured by cell viability in response to increasing amounts of each compound is a valuable correlate to the image correlation spectroscopy degree of aggregation measurements.

      Weaknesses:

      The subjectivity of foci counting has been well-recognized in the DNA repair field, and thus foci counts are usually interpreted relative to a set of technical and biological controls and across a meaningful time period. As such:

      (1) A more detailed description of the numerous prior studies examining the immunostaining of proteins such as γH2AX, RAD51, and RPA is needed to give context to the findings presented herein.

      We apologize for not providing enough detail. We have added further references and discussion. γH2AX foci counting, in particular, has been used in thousands of previous studies. (Pages 18 line 513 and 517)

      (2) The benefits of adopting image correlation spectroscopy should be discussed in comparison to other methods, such as super-resolution microscopy, which may also offer enhanced sensitivity over traditional microscopy.

      Thank you for raising this point. We have added this discussion (page 19 line 553). The limiting factor that ICS addresses is the partition coefficient of signal in a foci or cluster versus outside the cluster. Super-resolution will not necessarily improve this unless it is resolved down to single molecule counting. However, one would still need to evaluate how to define a cluster or foci in the background of non-cluster distribution.

      (3) Additional controls demonstrating the specificity of their antibodies to detection of the proteins of interest should be added, or the appropriate citations validating these antibodies included.

      We have added text stating that we only use validated antibodies (page 6 line 193). One thing to note is that we are measuring differences between treatment conditions, thus, if an antibody has non-specific labeling of proteins of cellular structures that do not change upon treatment, our approach would overcome this limitation.

      Reviewer #3 (Public review):

      Summary:

      This paper described a new tool called "Image Correlation Spectroscopy; ICS) to detect clustering fluorescence signals such as foci in the nucleus (or any other cellular structures). The authors compared ICS DA (degree of aggregation) data with Imaris Spots data (and ImageJ Find Maxima data) and found a comparable result between the two analyses and that the ICS sometimes produced a better quantification than the Imaris. Moreover, the authors extended the application of ICS to detect cell-cycle stages by analyzing the DAPI image of cells. This is a useful tool without the subjective bias of researchers and provides novel quantitative values in cell biology.

      Strengths:

      The authors developed a new tool to detect and quantify the aggregates of immunofluorescent signals, which is a center of modern cell biology, such as the fields of DNA damage responses (DDR), including DNA repair. This new method could detect the "invisible" signal in cells without pre-extraction, which could prevent the effect of extracted materials on the pre-assembled ensembles, a target for the detection. This would be an alternative method for the quantification of fluorescent signals relative to conventional methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The ICS theory section is essential and based on an excellent review from one of the authors. It would benefit greatly from a diagram showing where the quantities 𝒈(𝟎, 𝟎), 𝝎𝟎, and 𝒈inf come from in the 2D Gaussian fit, ideally for two cases where these quantities differ (i.e., how they correspond to different DA or TNoP values). In my opinion, this addition would greatly increase the manuscript's accessibility for DDR researchers. The citation of the review at the beginning would also be a plus.

      We have added the review citation at the front of the theory section (page 3 line 87).We have highlighted where g(0,0), the most critical measurement for determination of TNoP and DA, derives from in Figure 2D. However, it is difficult to describe all the curve fit parameters in an image as they have some interdependency on each other and thus labeling one in a single image would not independently capture how they might be observed in a different curve fit.

      (2) The TNoP measured in Figure 2 is a quantity about 2000-3000 times greater than the number of "traditionally detected" foci by both methods and the linear relations have very low Y intercepts. Can the authors comment explicitly on the physical interpretation of this number - are 2 to 3 thousand independent particles present within each "focus" detected by traditional means? If so, then what might one "particle" correspond to? (a single secondary antibody or fluorophore? a nucleosome?). In a similar vein, the X intercepts lie at around 25 foci, meaning that in images with fewer than that number of foci detected by ImageJ or Imaris, the ICS method should detect zero TNoP - is this in line with the authors' predictions? Is it possible that a first-order line fit is not the most appropriate relation between the two methods?

      We apologize for our brevity here. Since DA proved to be a more useful metric we did not spend much effort discussing TNoP. TNoP correlates to the number of clustered particles, or non-diffuse fluorophores. TNoP is the inverse of the number of individual particles per nucleus, but the value is not a direct measure of foci. If a sample had no clustering at all, the number of individual particles would be at a maximum and the TNoP would be at a minimum. However, as fluorophores cluster, the number of individual particles (i.e. non-clustered fluorophores) decreases, which increases the TNoP value. Therefore, TNoP has a correlation to the number of foci detected through traditional measurements, as we found here. Yet, TNoP is a relative measurement and cannot be compared across different conditions. Similar to foci counting, TNoP is unable to factor the size or intensity of each cluster, thus DA is a more appropriate quantification of the DNA damage response.

      The value of TNoP is dependent on the fitted point spread function and the area of the nucleus. The y=0 intercept of TNoP is defined by the optical setup and is not expected to necessarily go through x=0. Intriguingly, other groups have found that some foci identified through traditional measurements are actually clusters of multiple smaller foci, thus the concept of what a foci represents is difficult to interpret. Thus, here we aimed to show a general correlation of TNoP with foci count through traditional methods to reflect how ICS is similar to foci counting, then employed DA to overcome the limitations of defining a foci.

      We have tried to clarify this in the text (page 8, line 266)

      (3) Some suggestions to address the robustness of ICS:

      For a given sample (i.e. one segmented nucleus), the calculation of DA and TNoP should be similar between different images of that same nucleus taken at different times, similar to how the number of traditionally detected foci would be fairly invariant. In particular, it should be shown that these values are not just scaling with the higher normalized intensity seen in stronger DDR responses. In the same vein, the linear relationship between TNoP and "foci" should not change even if the confocal settings are slightly different (i.e., higher/lower illumination intensity) as long as the condition stipulated by the authors in the Discussion holds ("ICS can be implemented on any fluorescence image as long as the square relative fluorescence intensity fluctuations are detectable above noise fluctuations."). To show, as the title states, that spatial ICS is a robust tool, it would be desirable to demonstrate this with a series of images of the same cell at the same or varying excitation intensities.

      Thank you for your suggestions. Indeed, the calculation will be the same over sequential images of the same cell. Observations of dose dependent DA that does not correlate with intensity for RPA1 and RAD51 results (Fig. S5) directly demonstrates that DA does not just scale with intensity.

      We would not expect the TNoP to change with confocal setting, however we show in Figure 1 that the number of foci does indeed change with intensity settings as captured by thresholds. Therefore, any interpretation of TNoP vs. foci count would be very difficult to make at different microscope settings. To ensure we are fairly comparing ICS to existing analysis we keep the settings the same and measure changes between conditions.

      (4) More information is needed on how intensity normalization was performed. The Methods states "Measurements across experiments were normalized by the control in each dataset." The DMSO (0mM drug) plots all appear to have a mean of 1.0, so it appears the values for each set of control nuclei were divided by their own mean, and then the values for each set of experimental nuclei were divided by the mean value of all 3 controls as an aggregate; is this correct?

      We apologize for not being more clear. Thank you for raising this point. We normalized data to a control from each experimental group. Thus, in figures 3,4 and 5 data were collected over multiple experiments with one control per experiment and each treatment condition included in each experiment. Therefore, we normalized each result to the corresponding control from that imaging session. However, in Figure 8 we ran experiments at much higher throughput with multiple controls per experiment, thus the data were normalized to the overall average of the controls, which is why the control averages are not all at a value of 1. We have clarified this in the text. (Page 7 line 218).

      (5) Some more information about the ICS analysis should be given if the full code is not provided - in particular, how the nucleus mask was implemented on the "signal" channel (were the edges abruptly set to zero or was a window function introduced to avoid edge effects in the discrete FFT?

      Thank you for raising this point. We have added the code to GitHub - github.com/ dubachLab/ics. The signal region was established by simply applying the nuclear mask from the DAPI channel to the IF channel. Each region is padded with average intensity value at the edges for 2x the dimensions of the ROI to remove edge effects in the FFT.

      Minor comments:

      (1) Figure 3, 4, 5: I think it would aid figure readability if channels were labeled in the images themselves, not just in the legend.

      Thank you for the suggestion, we tried doing this and struggle to fit a label with the layout of the images. We were also concerned about interpretation of data in each column and the potential to assign data to each figure if they were so prominently labeled.

      (2) Supplemental Figures are mislabeled; the order given in the legends is S1, S2, S3, S2, S3. S4 is called out in the main text where it should be S5.

      Thank you for catching this error. We have made the necessary corrections. S4 contains data on cellular response to the drugs, while S5 contains intensity data in response to MMS.

      (3) It should be stated for each Figure what kind of microscopy was performed - I assume that it is confocal for everything except when widefield is explicitly stated, but for clarity please add this information.

      Indeed, this is correct, we have indicated which microscopy was used for each figure.

      (4) The MATLAB code and full (uncropped) Western blots should be provided as supplemental data if possible.

      We have included a GitHub link for the code and un-cropped western blots.

      (5) The p values from significance tests should indicate whether multiple comparisons correction was necessary (if suggested by Prism) and performed.

      Apologies for a lack of clarity but this was not necessary, significance was calculated vs. the next lower dose (e.g. 10 micromolar vs. 1 micromolar). We have clarified this in the methods (page 7 line 221).

      Reviewer #2 (Recommendations for the authors):

      Major points:

      In addition to the weaknesses noted above, to encourage widespread adoption of this method, the authors should make the tools that they used for their analysis publicly available. In a few instances (e.g., compare Figures 3J and 3L), other methods outperform DA. It would be meaningful to discuss when especially DA may be a better measure than others (such as intensity or number of foci).

      We have made code available on Github. We expect results, such as those in Figures 3J and 3L where intensity is significantly higher at the highest concentration but DA is not are reflective of the underlying biology and this may be interpreted differently under different experimental conditions. Imaris spots (Fig. 3K) also does not capture a significant increase at the highest dose of olaparib, suggesting that intensity may raise but it doesn’t not generate more foci. These results are likely highly dependent on the mechanism of olaparib at such a high concentration and the DDR response. We are hesitant to draw biological conclusions from these results and instead would like to highlight the capacity of ICS to evaluate the DDR, therefore we don’t want to make any broad comments about different applications.

      Minor points:

      (1) Pg. 12: "We used MMS to induce DNA damage in SKOV3 and OVCA429 cells. As expected, normalized intensity for RPA1 and RAD51 values (Figure S5) did not display a dose dependence on MMS concentration."

      Please provide a citation for the claim that RPA1 and RAD51 normalized intensities do not display a dose dependence on MMS concentration.

      These were data that we generated. We were not expecting an intensity change as that would presumably require increased protein generation in response to MMS, compared to gH2AX where the phospho-specific H2AX is generated in the DDR.

      (2) Pg. 12: "Similar to RPA1, RAD51 does not form distinguishable foci in the nuclei in cells without preextraction (Fig. 5)." Please provide a citation for this claim.

      We did not do pre-extraction and our results don’t produce changes in distinguishable foci. We provided citations discussing how, without pre extraction, foci formation for these proteins is not obvious (REF 38 and 39).

      (3) I noted that the authors cite one paper [38] apparently showing that RPA and Rad51 do not always form foci, however, this is in the C. elegans germline in response to micro irradiation, therefore I am not sure that it is applicable to human cells.

      We apologize for referencing a paper on C elegans. Most papers looking at RPA and RAD51 in the DDR use pre-extraction as it seems necessary to observe foci. Therefore, there are not as many papers, that we could find, that do not use pre-extraction. Reference 39 is in Hela cells.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Page 8, the second paragraph: In the Result section, it is better to describe how the authors carried out immuno-staining (without pre-extract subtraction) and ICS briefly, although the method is described in detail in the Method section.

      Thank you for the suggestion, we have added this description (page 8, line 259)

      (2) In Figure 5K-P: The authors analyzed "invisible" RAD51 foci on the image (Fig. 5L, M, O, and P) without pre-extraction. As a control experiment, it is useful to check whether pre-extraction would provide "visible" RAD51 foci and to examine the similar MMS concentration dependency shown in Figure 5R (or 5T). This would strengthen the power of the ICS analysis.

      Thank you for the suggestion. In our hands, pre-extraction is extremely subjective. We have tried performing pre-extraction but find highly variable results depending on conditions. Therefore, we did not include any pre-extraction here. We expect that performing these experiments may or may not agree with results in Figure 5 largely because we are unable to achieve repeatable pre-extraction foci counting.

      (3) Figure 6D (and 6C) looks very interesting. It would be important to show the interpretation of this correlation shown in the graph. Although the authors argued that ICS analysis results shown in the graph could provide new insight into the DDR (page 14, last line 5), as shown in another part, it is important to carry out the same analysis by using Imaris Spots. Moreover, it is interesting to apply the analysis to RAD51 foci (shown in Figure 5), given that the PARPi effect is enhanced in the absence of RAD51mediated recombination.

      We completely agree that this analysis may generate interesting results to help interpret the DDR response to PARP inhibition. These experiments are part of an ongoing follow up study where we extend the use of ICS to other parts of the DDR and investigate protein clustering across several proteins with impact on PARPi response. Therefore, since the focus of this manuscript is introducing ICS as a tool to study the DDR, we believe that omitting those data here does not deter from the central points of the manuscript. We including results in Figure 6 because we wanted to show how ICS could impact DDR research. Furthermore, combined with our advances shown in Figures 7 and 8, we are currently working on adapting ICS to be high-throughput and much simpler than Imaris spots for handling large datasets needed to generate results like those in Figure 6.

      Minor points:

      (1) Figure 1I, blue arrows: These showed an area with a higher background. Because of a low magnification, it is very hard to see the difference from the other areas of the background. It is better to show a magnified image of the representative region with a higher background.

      We hope that readers can see the higher intensity in the diffuse area. We attempted to construct a zoomed in area, but that either blocked a significant portion of the nonzoomed image or added complexity to the figure. We have noted that images in Figure S1 are larger and more obviously capture an increase in background intensity.

      (2) Figure 2 legend, line 5, the same as "A)": This should be "B".

      Here, the number of independent particle clusters is intended to be the same as A, the difference is that the independent particles are clusters in C and individual fluorophores in A.

      (3) Page 9, the first paragraph, last line, foci formation, and foci composition: These should be "focus formation and focus composition".

      We have changed this.

      (4) Page 15, the first paragraph, line 5, palbociclib, camptothecin, or etoposide: please explain what kinds of the drugs are.

      We have added that these drugs cause cells to stall at different cell cycle stages. Explaining the drugs would take considerable room in the text.

      (5) Page 16, the first paragraph, line 1, bleomycin: Please explain what this drug is.

      Similar to above, we have stated that this drug causes DNA damage, going into detail would take several sentences.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      (1) This manuscript introduces a useful curation pipeline of antibody-antigen structures downloaded from the PDB database. The antibody-antigen structures are presented in a new database called AACDB, alongside annotations that were either corrected from those present in the PDB database or added de-novo with a solid methodology. Sequences, structures, and annotations can be very easily downloaded from the AACDB website, speeding up the development of structure-based algorithms and analysis pipelines to characterize antibody-antigen interactions. However, AACDB is missing some key annotations that would greatly enhance its usefulness.

      Here are detailed comments regarding the three strengths above:

      I think potentially the most significant contribution of this database is the manual data curation to fix errors present in the PDB entries, by cross-referencing with the literature. However, as a reviewer, validating the extent and the impact of these corrections is hard, since the authors only provided a few anecdotal examples in their manuscript.

      I have personally verified some of the examples presented by the authors and found that SAbDab appears to fix the mistakes related to the misidentification of antibody chains, but not other annotations.

      (a) "the species of the antibody in 7WRL was incorrectly labeled as "SARS coronavirus B012" in both PDB and SabDab" → I have verified the mistake and fix, and that SAbDab does not fix is, just uses the pdb annotation.

      (b) "1NSN, the resolution should be 2.9 , but it was incorrectly labeled as 2.8" → I have verified the mistake and fix, and that sabdab does not fix it, just uses the PDB annotation.

      (c) "mislabeling of antibody chains as other proteins (e.g. in 3KS0, the light chain of B2B4 antibody was misnamed as heme domain of flavocytochrome b2)" → SAbDab fixes this as well in this case.

      (d) "misidentification of heavy chains as light chains (e.g. both two chains of antibody were labeled as light chain in 5EBW)" → SAbDab fixes this as well in this case.

      I personally believe the authors should make public the corrections made, and describe the procedures - if systematic - to identify and correct the mistakes. For example, what was the exact procedure (e.g. where were sequences found, how were the sequences aligned, etc.) to find mutations? Was the procedure run on every entry?

      We appreciate the reviewer’s valuable feedback. Our correction procedures combined manual curation with systematic sequence analysis. While most metadata discrepancies were resolved through cross-referencing original literature, we implemented a structured approach for identifying mutations in specific cases. For PDB entries labeled as variants (e.g., "Bevacizumab mutant" or "Ipilimumab variant Ipi.106") where the "Mutation(s)" field was annotated as "NO," we retrieved the canonical therapeutic antibody sequence from Thera-SAbDab, then performed pairwise sequence alignment against the PDB entry using BLAST program to identified mutated residues.

      This procedure was not applied to all entries, as mutations are context-dependent. Therapeutic antibodies have well-defined reference sequences, enabling systematic alignment. For antibodies lacking unambiguous wild-type references (e.g., research-grade or non-therapeutic antibodies), mutation annotations were directly inherited from the PDB or literature.

      All corrections have been publicly archived in AACDB. We have added a detailed discussion of this issue in the section “2.3 Metadata” of revised manuscript.

      (2) I believe the splitting of the pdb files is a valuable contribution as it standardizes the distribution of antibody-antigen complexes. Indeed, there is great heterogeneity in how many copies of the same structure are present in the structure uploaded to the PDB, generating potential artifacts for machine learning applications to pick up on. That being said, I have two thoughts both for the authors and the broader community. First, in the case of multiple antibodies binding to different epitopes on the same antigen, one should not ignore the potentially stabilizing effect that the binding of one antibody has on the complex, thereby enabling the binding of the second antibody. In general, I urge the community to think about what is the most appropriate spatial context to consider when modeling the stability of interactions from crystal structure data. Second, and in a similar vein, some antigens occur naturally as homomultimers - e.g. influenza hemagglutinin is a homotrimer. Therefore, to analyze the stability of a full-antigen-antibody structure, I believe it would be necessary to consider the full homo-trimer, whereas, in the current curation of AACDB with the proposed data splitting, only the monomers are present.

      We sincerely appreciate the reviewer’s insightful comments regarding the splitting of PDB files and we appreciate the opportunity to address the reviewer’s thoughtful concerns.

      Firstly, when two antibodies bind to distinct epitopes on the same antigen, we would like to clarify that this scenario can be divided into two cases based on the experimental context: Case1: When two antibodies bind to distinct epitopes on the same antigen, and their complexes are determined in separate structures. For example, SAR650984 (PDB: 4CMH) and daratumumab (PDB: 7DHA) target CD38 at non-overlapping epitopes. These two antibody-antigen complexes were determined independently, and their structures do not influence each other. Case 2 : When the crystal structure contains a ternary complex with two antibodies and an antigen, as in the example of 6OGE discussed in Section 2.2 of our manuscript. After reviewing the original literature, the experiment confirmed that the order of Fab binding does not affect the formation of the ternary complex, and the binding of one antibody does not enhance the binding of the other. This supports the rationale for splitting 6OGE into two separate structures. However, we acknowledge that not all ternary complexes in the PDB provide such detailed experimental descriptions in their original literature. We agree with the reviewer that in some cases, one antibody may stabilize the structure to facilitate the binding of a second antibody. For instance, in 3QUM, the 5D5A5 antibody stabilizes the structure, enabling the binding of the 5D3D11 antibody to human prostate-specific antigen. Such sandwich complexes are indeed valuable for identifying true epitopes and paratopes. Importantly, splitting the structure does not alter the interaction sites.

      Secondly, we fully agree with the reviewer that for antigens that naturally exist as homomultimers (e.g., influenza hemagglutinin as a homotrimer), the full multimeric structure should be considered when analyzing stability. In such cases, users can directly utilize the original PDB structures provided in their multimeric form. Our splitting approach is intended to provide an additional option for cases where monomeric analysis is sufficient or preferred, but it does not preclude the use of the original multimeric structures when necessary.

      (3) I think the manuscript is lacking in justification about the numbers used as cutoffs (1A^2 for change in SASA and 5A for maximum distance for contact) The authors just cite other papers applying these two types of cutoffs, but the underlying physico-chemical reasons are not explicit even in these papers. I think that, if the authors want AACDB to be used globally for benchmarks, they should provide direct sources of explanations of the cutoffs used, or provide multiple cutoffs. Indeed, different cutoffs are often used (e.g. ATOM3D uses 6A instead of 5A to determine contact between a protein and a small molecule https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c45147dee729311ef5b5c3003946c48f-Abstract-round1.html). I think the authors should provide a figure with statistics pertaining to the interface atoms. I think showing any distribution differences between interface atoms determined according to either strategy (number of atoms, correlation between change in SASA and distance...) would be fundamental to understanding the two strategies. I think other statistics would constitute an enhancement as well (e.g. proportion of heavy vs. light chain residues).

      Some obvious limitations of AACDB in its current form include:

      AACDB only contains entries with protein-based antigens of at most 50 amino acids in length. This excludes non-protein-based antigens, such as carbohydrate- and nucleotide-based, as well as short peptide antigens.

      AACDB does not include annotations of binding affinity, which are present in SAbDab and have been proven useful both for characterizing drivers of antibody-antigen interactions (cite https://www.sciencedirect.com/science/article/pii/S0969212624004362?via%3Dihub) and for benchmarking antigen-specific antibody-design algorithms (cite https://www.biorxiv.org/content/10.1101/2023.12.10.570461v1)).

      We thank the reviewer for raising this critical point about the cutoff values used in AACDB. In the current study, the selection of the threshold value is very objective; the threshold chosen in the manuscript is summarized based on existing literature, and we have provided more literature support in the manuscript. The criteria for defining interacting amino acids in established tools, typically do not set the ΔSASA exceed 1 Å2 and the distance exceed 6 Å. While our manuscript emphasizes widely accepted thresholds for consistency with prior benchmarks, AACDB explicitly provides raw ΔSASA and distance values for all annotated residues. Users can dynamically filter the data from downloaded files by excluding entries exceeding their preferred thresholds (e.g., selecting 5Å instead of 6Å). This ensures adaptability to diverse research needs. In the revised version, we reset the distance threshold to 6 Å and calculated the interacting amino acids in order to give the user a wider range of choices. In the section “3.2 Database browse and search” of revised manuscript, we provide a description of the flexible choice of thresholds for practical use.

      Furthermore, distance and ΔSASA are two distinct metrics for evaluating interactions. Distance directly quantifies spatial proximity between atoms, reflecting physical contacts such as van der Waals interactions or hydrogen bonds, and is ideal for identifying direct spatial adjacency. ΔSASA, on the other hand, measures changes in solvent accessibility of residues during binding, capturing the contribution of buried surfaces to binding free energy. Even for residues not in direct contact, reduced SASA due to conformational changes may indicate indirect functional roles.

      As demonstrated through comparisons on the detailed information pages, the sets of interacting amino acids defined by these two methods differ by only a few residues, with no significant variation in their overall distributions. However, since interaction patterns vary significantly across different complexes, analyzing residue distributions across all structures using both criteria is not feasible.

      We thank the reviewer for highlighting these limitations. AACDB currently focuses on protein-based antigens ≤50 amino acids to prioritize structural consistency, which excludes non-protein antigens and shorter peptides. While affinity annotations are critical for benchmarking antibody design tools, these data were not integrated in this release due to insufficient data verification caused by internal team constraints. We acknowledge these gaps and plan to expand antigen diversity and incorporate affinity metrics in future updates.

      Reviewer #2:

      Summary:

      Antibodies, thanks to their high binding affinity and specificity to cognate protein targets, are increasingly used as research and therapeutic tools. In this work, Zhou et al. have created, curated, and made publicly available a new database of antibody-antigen complexes to support research in the field of antibody modelling, development, and engineering.

      Strengths:

      The authors have performed a manual curation of antibody-antigen complexes from the Protein Data Bank, rectifying annotation errors; they have added two methods to estimate paratope-epitope interfaces; they have produced a web interface that is capable of both effective visualisation and of summarising the key useful information in one page. The database is also cross-linked to other databases that contain information relevant to antibody developability and therapeutic applications.

      Weaknesses:

      The database does not import all the experimental information from PDB and contains only complexes with large protein targets.

      Thank you for the valuable feedback. As previously responded to Reviewer 1, due to limitations within our team, comprehensive data integration from PDB has not been achieved in the current version. We acknowledge the significance of expanding the database to encompass a broader range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Nevertheless, we are committed to enhancing the database in upcoming upgrades to provide users with a more comprehensive and inclusive resource

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 194: "produce" → "produced"

      We thank the reviewer for the feedback. We have checked the grammar and spelling carefully in the revised manuscript.

      (2) As mentioned in the public review, I think adding binding affinity annotations would greatly enhance the use cases for the database.

      We thank the reviewer for the suggestion. As the response in “Public review”. Due to team constraints, these data are not integrated into this release but are being collated. We recognize these gaps and plan to expand antigenic diversity and incorporate affinity metrics in future updates.

      (3) I think adding a visualization of interface atoms and contacts on an entry's webpage would be useful for someone exploring specific entries. It also would be useful if the authors provided a pymol command to select interface residues since that's a procedure any structural biologist is likely to do.

      We sincerely appreciate the reviewer’s constructive suggestions. In response to the request for enhanced visualization and accessibility of interface residue information, we have implemented the following improvements: (1) Web Interface Visualization. On the entry-specific webpage, we have added an interactive visualization window that highlights the antigen-antibody interaction interface using distinct colors. The interaction interface visualization has been incorporated into Figure 5 of the revised manuscript, with a detailed description. (2) PyMOL Command Accessibility. The “Help” page now provides step-by-step PyMOL commands to select and visualize interface residues.

      (4) I think the authors should provide headers to the files containing interface residues according to the change-in-SASA criterion, as they do for those computed according to contact. This would avoid unnecessary confusion - however slight - and make parsing easier. I was initially confused by the meaning of the last column, though after a minute I understood it to be the change in SASA.

      We thank the reviewer for providing such detailed feedback. We thank the reviewer for the comment and the suggestion. We have provided headers for the files of the interacting residues defined by ΔSASA.

      (5) Line 233: "AACDB's data processing pipeline supports mmCIF files" → The meaning and implications of this statement are not obvious to me, and are mentioned nowhere else in the paper. Do you mean that in AACDB there are structure entries that the RCSB PDB database only has in mmCIF file format, and not .pdb format? So, effectively, there are some entries in AACDB that are not in any other antibody-specific database?I checked and, as of Dec 3rd, 2024, there are 41 structures in AACDB that are NOT in SAbDab. Manually checking 5 of those 41 structures, none are mmCIF-only structures.

      We thank the reviewer for the valuable comment. Because of the size of the structures within certain entries, representing them in a single PDB format data file is not feasible due to the excessive number of atoms and polymer chains they contain. As a result, PDB stores these structures in “mmcif” format files. In AACDB, 47 entries, such as 7SOF, 7NKT, 7B27, and 6T9D, are only available in the “mmCIF” format from the PDB. The “.pdb” and “.cif” files contain atomic coordinates in distinct text formats, and the segmentation of these structure files is automatically conducted based on manually annotated antibody-antigen chains. To accommodate this, we have incorporated these considerations into our file processing pipeline, thereby enabling a fully automated file segmentation process. Additionally, we employed Naccess to calculate interatomic distances. However, since this software only accepts .pdb format files as input, we also converted all split .cif files into .pdb format within our fully automated pipeline. We apologize for the lack of clarity in the original manuscript and have included a more detailed explanation in the "2.2 PDB Splitting" section of the revised manuscript.

      Reviewer #2:

      (1) In SabDab and PDB, experimental binding affinities are also reported: could the authors comment on whether they also imported this information and double-checked it against the original paper? If it wasn't imported, that might discourage some users and should be considered as an extension for the future.

      We thank the reviewer for the comment and the suggestion. As the response in “Public review”. Due to current resource constraints, quantitative affinity data has not been incorporated into this release but is undergoing systematic curation. We explicitly recognize these limitations and propose a two-pronged strategy for future iterations: (1) broadening antigen diversity coverage through expanded structural sampling, and (2) integrating quantitative binding affinity measurements. In the Discussion section, we have included description outlining the planned enhancements.

      (2) Line 49-50: the references mentioned in connection to deep learning methods for antibody-antigen predictions seem a bit limited given the amount of articles in this field, with 3 of 4 references on one method only (SEPPA), could the authors expand this list to reflect a bit more the state of the art?

      We thank the reviewer for the suggestion. We agree that more relevant studies should be listed and therefore more references are provided in the revised manuscript.

      When mentioning the limitations of the existing databases, it feels a bit that the criticism is not fully justified. For instance:

      Line 52-53: could the authors elaborate on the reasons why such an identification is challenging? (Isn't it possible to make an efficient database-filtered search? Or rather, should one highlight that a more focussed resource is convenient and why?)

      Thank you for feedback. In this study, the keywords "antibody complex," "antigen complex," and "immunoglobulin complex," were employed during data collection. PDB returned over 30,000 results, of which only one-tenth met our criteria after rigorous filtering. This demonstrates that keyword searches, while useful, inherently limit result precision and introduce substantial redundancy, likely due to the PDB's search mechanism. That’s why we illustrated the significant challenges in identifying antibody-antigen complexes from general protein structures in the PDB.

      Line 55: reading the website http://www.abybank.org/abdb/, it would be fairer to say that the web interface lacks updates, as the database and the code have gone through some updates. Could the authors provide a concrete example of the reason why: 'The AbDb database currently lacks proper organization and management of this valuable data.'?

      We thank the reviewer for highlighting this issue. In our original manuscript, the statement that the AbDb database "lacks proper organization and management" was based on the absence of explicit statement regarding data updates on its official website at the time of submission, even though internal updates to its content may have occurred. We fully respect the long-standing contributions of AbDb to antibody structural research, and our comments were solely directed at the specific state of the database at that time. As the reviewer noted, following the release of our preprint, we have also taken note of AbDb's recent updates. To reflect the latest developments and avoid potential misinterpretation, we have revised the original statement in revised manuscript.

      Also 'this rapid updating process may inadvertently overlook a significant amount of information that requires thorough verification,': it's difficult for me to understand what this means in practice. Could the authors clarify if they simply mean that SabDab collects information from PDB and therefore tends to propagate annotation errors from there? If yes, I think it's enough to state it in these terms, and for sure I agree that the reason is that correcting these annotation errors requires a substantial amount of work.

      We thank the reviewer for providing such detailed feedback on the manuscript. We acknowledge that SabDab represents a highly valuable contribution to the field, and its rapid update mechanism has significantly advanced related research areas. However, as stated by the reviewer, we aim to clarify that SabDab primarily relies on automated metadata extraction from the PDB for annotation, and its rapid update process inherently inherits raw data from upstream sources. According to their paper, manual curation is only applied when the automated pipeline fails to resolve structural ambiguities. This workflow—dependent on PDB annotations with limited manual verification—may propagate errors provided by PDB. Examples include species misannotation and mutation status misinterpretation. We fully agree with the reviewer's observation that correcting errors in such cases necessitates labor-intensive manual curation, which is a core motivation for our study.

      Line 86: why 'Structures that consisted solely of one type of antibody were excluded'? Why exclude complexes with antigens shorter than 50 amino acids? These complexes are genuine antibody-antigen complexes.

      We thank the reviewer for the valuable question. The AACBD database is dedicated to curating structural data of antigen-antibody complexes. Structures featuring only a single antibody type are classified as free antibodies and systematically excluded from the database due to the absence of protein-bound partners. During data screening , we retained sequences shorter than 50 amino acids by categorizing them as peptides rather than eliminating them outright. The current release exclusively encompasses complexes with protein-based antigens. Meanwhile, complexes involving peptide, haptens, and nucleic acid antigens are undergoing systematic curation, with planned inclusion in future updates to broaden antigen category representation.

      Line 96 needs a capital letter at the beginning.

      Line 107: 'this would generate' → 'this generates' (given it is something that has been implemented, correct?).

      Line 124: missing an 'of'.

      Line 163: inspiring by -> inspired by.

      Thank you for feedback. All of the above grammatical or spelling errors have been revised in the manuscript.

      Line 109-111: apart from the example, it would be good to spell out the general rule applied to anti-idiotypic antibodies.

      We thank the reviewer for the valuable feedback. For anti-idiotypic antibodies complex. the partner antibody is treated as a dual-chain antigen, , necessitating individual evaluation of heavy chain and light chain interactions with the anti-idiotypic component. We have given a general rule for anti-idiotypic antibodies in section “2.2 PDB splitting” of revised manuscript.

      Line 155-159: could the authors provide references for the two choices (based on sasa and any-atom distance) that they adopted to define interacting residues?

      We thank the reviewer for the comment and the suggestion. As the same as the response to reviewer #1 in Public review. The interacting residues definition and the threshold chosen in the manuscript is summarized based on existing literature. We have added additional references for support in section “1.Introduction”. Our resource does not provide a fixed amino acid list. Instead, all interacting residues are explicitly documented alongside their corresponding ΔSASA (solvent-accessible surface area changes) and intermolecular distances, allowing researchers to flexibly select residue pairs based on customized thresholds from downloadable datasets. Furthermore, aligning with widely adopted criteria in current literature—where interactions are defined by ΔSASA >1 Ų and atomic distances <6 Å, we have recalibrated our analysis in the revised version. Specifically, we replaced the previous 5 Å distance threshold with a 6 Å cutoff to recalculate interacting residues.

      Line 176-178: could the authors re-phrase this sentence to clarify what they mean by 'change in the distribution'?

      We thank the reviewer for the suggestion. Our search was conducted with an end date of November 2023. However, Figure 3B includes an entry dated 2024. Upon reviewing this record, we identified that the discrepancy arises from the supersession of the 7SIX database entry (originally released in December 2022) by the 8TM1 version in January 2024. This version update explains the apparent chronological inconsistency. We regret any lack of clarity in our original description and have revised the corresponding section in the manuscript to explicitly clarify this change of database.

      Caption Figure 3: please spell out all the acronyms in the figure. Provide the date when the last search was performed (i.e., the date of the last update of these statistics).

      We thank the reviewer for the comment. We have systematically expanded all acronyms and included update dates for statistics in the legend of Figure 3. Corresponding changes have also been made to the statistical pages on the website.

      Finally, it would be advisable to do a general check on the use of the English language (e.g. I noted a few missing articles). In Figure 5 DrugBank contains typos.

      We sincerely appreciate the reviewer's meticulous attention to linguistic precision. We have corrected the typographical error in Figure 5 and conducted a comprehensive review of the entire manuscript to ensure accuracy and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      Thank you for bringing this up, we have now clarified in the text that while FA-2s did respond at a low rate during the experiment, their responses were not reliably driven by the force stimuli. In the Methods section we have included the following text:

      “Initially, 10 FA-2 neurons were also included in the analysis. But their responsiveness during the experiment was remarkably low, and unlike the other neuron types, their responses were rarely affected by force stimuli. Specifically, only one of the observed FA-2 neurons responded during the force protraction phases. Due to the lack of clear stimulus-driven responses, FA-2 neurons were subsequently excluded from further analysis.”

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      This is a very interesting question! A discernible effect from the previous stimulus could persist at the end of the current stimulation (see Figure 4C), potentially influencing the next one—a 2-stimuli-back effect. Unfortunately, our experimental design did not allow for rigorous testing of this effect. While all possible pairs of stimulus directions were included in immediately consecutive trials, this was not the case for pairs separated by additional trials. Hence, the combination of a likely weak effect and limited variation in history precluded a thorough analysis of a 2-stimuli-back effect. Future work should delve into the time course of the viscoelastic effect in greater detail.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      An investigation into the potential impact of the relationship between the receptive field location on the fingertip skin and the primary contact site of the stimulus surface revealed no discernible influence for SA-1 and SA-2 neurons. In contrast, FA-1 neurons, particularly those predominantly sensitive to the previous stimulation or displaying mixed sensitivity, exhibited a tendency to terminate near the primary stimulation site. We have added these observations to the text:

      “We found no straightforward relationship between a neuron's sensitivity to current and previous stimulation and its termination site in fingertip skin. Specifically, there was no statistically significant effect of the distance between a neuron's receptive field center and the primary contact site of the stimulus surface on whether neurons signaled current, prior, or mixed information for SA-1 (Kruskal-Wallis test H(2)=3.86, p= 0.15) or SA-2 neurons (H(2)=0.75, p=0.69). However, a significant difference emerged for FA-1 neurons (H(2)=8.66, p=0.01), indicating that neurons terminating closer to the stimulation site on the flat part of the fingertip were more likely to signal past or mixed information.”

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      The firing patterns of both spontaneously and non-spontaneously active SA-2 neurons shared similarities in terms of adaptation and range of firing rate modulation in response to force stimuli, i.e., ‘dynamic response’. The distinction lay in the pattern of modulation of the firing rate associated with stimulus presentations. For spontaneously active SA-2 neurons, this modulation occurred around a significant background discharge, implying that a force stimulus could either decrease or increase the firing rate, depending on how it deformed the fingertip. This characteristic is well illustrated by the firing pattern of the neuron depicted in the lower panels of Figure 3D. Conversely, in non-spontaneously active SA-2 neurons, a force stimulus could only induce an increase in the firing rate or no change. Although the neuron depicted in the upper panels of Figure 3D exhibited some background activity, it serves to exemplify this characteristic. In the text, we have elucidated the dynamics of the SA-2 neuron response by highlighting that force stimulation can either decrease or increase the firing rate in neurons with spontaneous activity through the following addition/change:

      “This increased variability was most evident during the force protraction phase where most neurons exhibited the most intense responses. Increased variability was also observed in instances where the dynamic response to force stimulation involved a decrease in the firing rate (lower panels of Figure 3D). This phenomenon was observed in SA-2 neurons that maintained an ongoing discharge during intertrial periods (cf. Fig. 2A). In these cases, the response to a force stimulus constituted a modulation of the firing rate around the background discharge, signifying that a force stimulus could either decrease or increase the firing rate depending on the prevailing stimulus direction.”

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      SA-2 neurons, in general, are well-known for undergoing significant post-stimulation depression (e.g., Knibestöl and Vallbo, 1970; Chambers et al., 1972; Burgess and Perl, 1973). In our force stimulations, this post-excitatory depression manifested as a reduced or absent response during the latter part of the stimulus retraction period for stimuli in directions that markedly excited the neuron. The excitability recovered when the fingertip relaxed during the subsequent intertrial period, and for "spontaneously active" neurons, the firing resumed (see examples in Figure 7A). Furthermore, some “spontaneously active” neurons could be silenced or exhibit a near-silent period during force stimulation for certain force directions, while the spontaneous firing returned during the upcoming intertrial period when the fingertip shape recovered (for example, see responses to stimulation in the proximal and especially ulnar directions in the top panel in Figure 7A).

      Regarding the location of the receptive field centres of spontaneously active and non-spontaneously active SA-2 neurons on the fingertip we did not observe any obvious spatial segregation. To illustrate this, we have revised Figure 1A by color-marking SA-2 neurons that exhibited ongoing activity in intertrial periods, and the figure caption has been modified accordingly:

      “Figure 1. Experimental setup. A. Receptive field center locations shown on a standardized fingertip for all first-order tactile neurons included in the study, categorized by neuron type. Purple symbols denote spontaneously active SA-2 neurons exhibiting ongoing activity without external stimulation.”

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      We found no clear indications that the responses of FA-1 and SA-1 could be readily anticipated based on the firing patterns of SA-2 neurons.

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      We have explored this topic further in the text, referring to recent studies modeling essential aspects of fingertip mechanics. However, in our view, current models lack the capability to predict the specific nature sought by the reviewer. These models should include a detailed understanding of the intricate networks of collagen fibers anchoring the pulp tissue at the distal phalangeal bone and the nail. They should also consider potential inherent directional preferences of the receptor organs, attributed to their microanatomy. The text modifications are as follows:

      “In addition to the receptor organ locations, the variation in sensitivity among neurons to fingertip deformations in response to both previous and current loadings would stem from the fingertip’s geometry and its complex composite material properties. Possible inherent directional preferences of the receptor organs, attributed to their microanatomy, could also be significant. However, mechanical anisotropy, particularly within the viscoelastic subcutaneous tissue of the fingertip induced by intricately oriented collagen fiber strands forming fat columns in the pulp (Hauck et al., 2004), are likely to play a crucial role. This anisotropy would shape the dynamic pattern of strain changes at neurons' receptor sites, intricately influencing a neuron's sensitivity not only to current but also to preceding loadings. Indeed, recent modeling efforts suggest that such mechanical anisotropy strongly influences the spatiotemporal distribution of stresses and strains across the fingertip (Duprez et al., 2024).”

      Relatedly, we have included additional text to provide a more comprehensive explanation of the “bulk deformation” of the fingertip that occurs during the loadings:

      “As pressure increases in the pulp, the pulp tissue bulges at the end and sides of the fingertip. Simultaneously, the tangential force component amplifies the bulging in the direction of the force while stretching the skin on the opposite side.”

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      The diversity in responses among neurons is instrumental in enhancing the information transmitted to the brain by averting redundancy in information acquisition. This diversity thereby contributes to an overall increase in information. We've included a brief statement, along with several references, underscoring this concept:

      "The resulting diversity in the sensitivities of neurons might enhance the overall information collected and relayed to the brain by the neuronal population, facilitating the discrimination between tactile stimuli or mechanical states of the fingertip (see Rongala et al., 2024; Corniani et al., 2022; Tummala et al., 2023, for more extensive explorations of this idea)."

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      We have expanded the concluding paragraph of the discussion, specifically delving into the question of whether the mechanical memory effect serves a deliberate purpose or is simply an incidental byproduct of our skin structure:

      “In any case, the viscoelastic deformability of the fingertips plays a pivotal role in supporting the diverse functions of the fingers. For example, it allows for cushioned contact with objects featuring hard surfaces and allows the skin to conform to object shapes, enabling the extraction of tactile information about objects' 3D shapes and fine surface properties. Moreover, deformability is essential for the effective grasping and manipulation of objects. This is achieved, among other benefits, by expanding the contact surface, thereby reducing local pressure on the skin under stronger forces and enabling tactile signaling of friction conditions within the contact surface for control of grasp stability. Throughout, continuous acquisition of information about various aspects of the current state of the fingertip and its skin by tactile neurons is essential for the functional interaction between the brain and the fingers. In light of this, the viscoelastic memory effect on tactile signaling of fingertip forces can be perceived as a by-product of an overall optimization process within prevailing biological constraints.”

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      It is likely that the time constant depends to some extent on mechanical factors of the skin, which will likely change due to age or environmental factors. However, while these questions are intriguing, they fall outside the scope of the current study and we are not aware of studies that have addressed these issues directly in experiments either.

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

      Time constants for tissue viscoelasticity have been estimated to extend up to several seconds (see citations in the introduction). While direct perceptual effects could indeed be explored through psychophysical experimental paradigms, we are currently unaware of any studies specifically addressing the type of effect described in this study. In addition to the statement that, concerning manipulation and haptic tasks, "to our knowledge, a possible influence of fingertip viscoelasticity on task performance has not been systematically investigated," we have now also addressed tactile psychophysical tasks conducted during passive touch with the following sentence in the text:

      “Similarly, there is a lack of systematic investigation of potential effects of fingertip viscoelasticity on performance in tactile psychophysical tasks conducted during passive touch.”

      Reviewer #2 (Public Review):

      Summary:

      The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:

      - Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.

      - Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:

      - Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Thank you for this suggestion. We have added a new section investigating the link between skin deformation and neural firing in more depth via a simple neural model. Please see our answer below in the ‘Recommendations’ section for further details.

      Validity of conclusions:

      The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:

      This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (Very) minor comments

      - The authors say at the beginning of the Results that, "The fourth type of tactile neurons in the human glabrous skin, fast adapting type II neurons...". Although generally written that there are four types of afferent in the glabrous skin, it would be better to state that these are low-threshold A-beta myelinated mechanoreceptive afferents, at least one time, as there are other types of afferent in the glabrous skin that respond to mechanical stimulation (e.g. low and high threshold C-fibers).

      This is now clarified at the start of the Results section:

      “We recorded action potentials in the median nerve of individual low-threshold A-beta myelinated first-order human tactile neurons innervating the glabrous skin of the fingertip…”

      - Fig. 3: Could you add '(N)' as the measurement of force for Fig. 3A for Fz, Fy, and Fz? Also, please change 'Data was recorded' to 'Data were recorded' in the legend.

      Fixed.

      - At the beginning of the Methods, you say that your study conforms to the Declaration of Helsinki, which actually requires pre-registration in a database. If you did not pre-register your study, please can you add '... in accordance with the Declaration of Helsinki, apart from pre-registration in a database'.

      Thanks for making us aware of this. We have added the suggested qualifier to the ethics statement.

      Reviewer #2 (Recommendations For The Authors):

      The neural representation/encoding of the actual displacement vectors would be a useful addition to the analyses. These vectors have been demonstrated to systematically change with the condition in the irregular series (Figure 2E) and will thus significantly act on the dynamics of induced mechanical changes in the skin with a given interaction force. Thus, it could be examined how the neurons code the magnitude of displacements as well as their direction. An evaluation of the extent to which the imposed displacement magnitudes are encoded in the neural responses would be a useful addition in explaining the signalling of the force events and how the central nervous system decodes these. Evaluating an alternative displacement encoding for comparison to pure force encoding may reveal more about how contact events are represented in the tactile system, which must decode these variable afferent signals to reconstruct a percept of the interaction. It could then be explored how the central nervous system may then scale the dynamic afferent responses based on the background viscoelastic state likely to be present in the SA-II afferent signals (Figure 7) for a context in which to evaluate the dynamic contact forces. This may of course be a complex relationship for the type-I afferents, where the underlying mechanical events evoking the firing (microslips not represented in global forces) have not been measured here. Such a model could be more widely applicable, as the skin viscoelasticity and displacement magnitudes are a straightforward measurement metric and could perhaps be used as a better proxy for neural signalling. This would allow the investigation of a wider variety of forces, and the study of the timing of the viscoelastic effect, both of which have been fixed here. This would give the work a broader impact, rather than just highlighting that this effect produces variability, it could reveal if this mechanical feature is structured in the neural representation. The categorical encoding/decoding tested here is specific to the stimuli used (magnitudes, intervals), but there is the possibility that this may be more generally applicable (within the bounds of forces/speeds) if the underlying basis of the variability in the signalling produced by the viscoelasticity is identified. Since the time course of the viscoelasticity has not been measured here (fixed forces and intervals), further study is required to fully understand the implications this has for a wider variety of situations.

      We agree that a better understanding of how the mechanical deformations are reflected in the resulting spike trains would be valuable. While ultimately a full understanding will need precise measurements of skin deformation across the whole fingertip to account for mechanical propagation to mechanoreceptor locations, relating the deformations at the contact location with neural firing patterns directly can provide useful hints into which aspects of deformation are encoded and how. To this end, we ran a new analysis that aimed to predict the time-varying neural responses directly from the recorded mechanical movements of the contactor.

      Below we have reproduced the new results and methods text along with the additional figures for this analysis. Note that we have also added text in the Discussion to interpret these findings in the context of our other results.

      New section in Results titled Predicting neural responses from contactor movements: “The similarity in the history-dependent variation in neural firing and fingertip deformation at a given force stimulus suggests that neuronal firing is determined by how the fingertip deforms rather than the applied force itself. However, this similarity does not clarify the relationship between fingertip deformation dynamics and neural signaling. To investigate further, we fit cross-validated multiple linear regression models to evaluate how well distinct aspects of contactor movement could predict the time-varying firing rates of individual neurons during the protraction phases of the irregular sequence. The models used predictors based on (1) the three-dimensional position of the contactor, (2) its three-dimensional velocity, (3) a combination of position and velocity signals, and, finally, (4) position and velocity signals along with all possible two-way interactions between them, capturing potentially complex relationship between fingertip deformations and neural signaling.

      Comparing the variance explained (R<sup>2</sup>) by each regression model for each neuron type revealed clear differences between the models (Figure 5A). A two-way mixed design ANOVA, with regression model as within-group effects and neuron type as a between-group effect revealed a main effect of model on variance explained (F(3,462) = 815.5, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.84). Model prediction accuracy overall increased with the number of predictors, with the two-way interaction model outperforming all others (p < 0.001 for all comparisons, Tukey’s HSD). Additionally, a significant main effect of neuron type (F(2,154) = 29.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.28) and a significant interaction between regression model and neuron type were observed (F(6,462) = 50.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.40).

      For neuron type, model predictions were most accurate for SA-2 neurons, followed by SA-1 neurons, with FA-1 neurons showing the lowest accuracy (p < 0.003 for all comparisons, Tukey’s HSD). The interaction between model and neuron type revealed distinct patterns. For SA-1 and SA-2 neurons, position-only and velocity-only models had similar prediction accuracy (p ≥ 0.996, Tukey’s HSD) with no significant differences between these neuron types (p ≥ 0.552, Tukey’s HSD). FA-1 neurons performed poorly with the position-only model but showed higher accuracy with the velocity-only model (p < 0.001, Tukey’s HSD) and better than SA-1 neurons (p = 0.006, Tukey’s HSD). Models combining position and velocity predictors (without interactions) surpassed both position-only and velocity-only models for SA-1 and SA-2 neurons (p < 0.001, Tukey’s HSD). Overall, the differences between neuron types broadly match their tuning to static and dynamic stimulus properties.

      The two-way interaction model, accounting for most variance in neural responses, produced mean R<sup>2</sup> values of 0.75 for FA-1, 0.88 for SA-1, and 0.91 for SA-2 neurons (Figure 5A). To evaluate the contribution of the different predictors, we ranked them using the permutation feature importance method, focusing on the six most important ones. Regression analyses using only these variables explained almost all of the variance explained by the full model, with a median R<sup>2</sup> reduction of just 0.055 across all neurons. Across all neuron types, at least half included all three velocity components (dPx, dPy, dPz) among the top six, with FA-1 neurons showing the highest prevalence (Figure 5B). Interactions between normal position (Pz) and each velocity component were also frequently observed, while interactions involving tangential position and velocity components were less common. Interactions among velocity components were relatively well represented, followed by interactions limited to position components. Position signals were generally less represented, except for normal position (Pz) in slowly adapting neurons, where it appeared in 50% of SA-1 and 68% of SA-2 neurons. Despite these broad trends, important predictors varied widely across ranks even within a given neuron class (see Figure 5-figure supplement 1), and even the most frequent variables appeared in only a subset of cases, suggesting broad variability in sensitivity across neurons.”

      New methods paragraph titled Predicting time-varying firing rates from skin deformations:

      “This analysis was conducted in Python (v3.13) with pandas for data handling, numpy for numerical operations, and scikit-learn for model fitting and evaluation.

      To assess how well individual neurons' time-varying firing rates could be predicted from simultaneous contactor movements, we fitted multiple linear regression models (see Khamis et al., 2015, for a similar approach}. This analysis focused on the force protraction phase of the irregular sequence, where neurons were most responsive and sensitive to stimulation history. Data from 100 ms before to 100 ms after the protraction phase (between -0.100 s and 0.225 s relative to protraction onset) were included for each trial. Neurons were included if they fired at least two action potentials during the force protraction phase and the following 100 ms in at least five of the 25 trials. This ensured sufficient variability in firing rates for meaningful regression analysis, resulting in 68 SA-1, 38 SA-2, and 51 FA-1 neurons being included.

      Contractor position signals digitized at 400 Hz were linearly interpolated to 1000 Hz. Instantaneous firing rates, derived from action potentials sampled at 12.8 kHz, were resampled at 1000 Hz to align with position signals. A Gaussian filter (σ = 10 ms, cutoff ~16 Hz) was applied to the firing rate as well as to the position signals before differentiation. To account for axonal conduction (8–15 ms) and sensory transduction delays (1–5 ms), firing rates were advanced by 15 ms to align approximately with independent variables.

      Regressions were performed using scikit-learn's Ridge and RidgeCV regressors, which apply L2 regularization to mitigate overfitting. Hyperparameter tuning for the regularization parameter (alpha) was performed using GridSearchCV with a predefined range (0.001–1000.0), incorporating five-fold cross-validation to select the best value. To minimize overfitting risks, model performance was further validated with independent five-fold cross-validation (KFold), and R<sup>2</sup> scores were computed using cross_val_score.

      We constructed four linear regression models with increasing complexity: (1) Position-only, using three-dimensional contactor positions (Px, Py, Pz); (2) Velocity-only, using three-dimensional velocities (dPx, dPy, dPz); (3) Combined, including all position and velocity signals (6 predictors); and (4) Interaction, including all signals and their two-way interactions (21 predictors). All features were standardized using StandardScaler to improve regularization and model convergence. PolynomialFeatures generated second-order interaction terms for the interaction model. Feature importance was evaluated with permutation_importance, and simpler models were built using the most important features. These models were validated through cross-validation to assess retained explanatory power.”

      Minor:

      - It would be useful to add a brief description of the material aspects of the contactor tip to the methods (as per Birznieks 2001).

      We have added the following statement:

      “To ensure that friction between the contactor and the skin was sufficiently high to prevent slips, the surface was coated with silicon carbide grains (50–100 μm), approximating the finish of smooth sandpaper.”

      - The axes labelling on Figure 3A and legend description is ambiguous, probably placing the Px, Py, and Pz labels on the far left axes and the Fx, Fy, and Fz on the right side of the far right axes would make this clearer.

      Label placement has been improved along with some other minor fixes.

      - For the quasi-static phase analysis, the phrase "absence of loading" used in reference to the interstimulus period and SA-II afferents does not seem to be a correct description. The finger is still loaded (at least in the normal direction), with a magnitude of imposed displacement that counteracts the viscoelastic force exerted by the skin mechanics of the fingertip. Although there is a zero net-force load, a mechanical stimulus is still being actively applied to the skin.

      We have changed the wording throughout the text and now consistently refer either to the “interstimulus period” directly or to an “absence of externally applied stimulation” to avoid confusion.

    1. If teachers and students can meet each other's needs, a comfortable life for all is the reward. Sizer believed that when one or the other breaks this unspoken contract, trouble is likely to follow.

      This passage really reflects the "tacit understanding" in many classrooms - if students don't cause trouble, the teacher can easily finish the class, and everyone doesn't make things difficult for each other. I used to feel this atmosphere in high school. Some students in the class didn't study much, but as long as they didn't disturb others, the teacher would let them "slack off" by default. It's like an unspoken rule of "we don't undermine each other." The author quoted Sizer's "Let's Make a Deal" to satirize this seemingly calm but lacking in-depth communication in education. I think this "transactional" teaching atmosphere may seem to be easy in the short term, but in the long run, it will make teaching lose its real challenge and meaning.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      We thank the reviewer for their feedback.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

      We acknowledge that the terminologies “domains” and “compartments” might lead to confusion. To avoid confusion we have removed the term “compartment” from the manuscript.

      Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told.

      Thank you for this feedback. We appreciate that many readers may not accustomed to looking at mosaic knockouts. As discussed in a previous review article (Zhang & Reed 2017), we rely on a combination of contralateral asymmetry and replicates to infer mutant phenotypes. For many genes (e.g. pigmentation enzymes) mutant clones are obvious, but for other types of genes (e.g. ligands) clone boundaries are sometimes not directly diagnosable. It is simply a limitation of our study system. Nonetheless, you see for yourself that “the back of the wing is altered in some butterflies” – the effects of deleting mirror are clear and repeatable.

      In terms of interpreting mutant phenotypes, we agree that that paper would benefit from a better description of the specific effects. Therefore, we have included an improved, more systematic description of phenotypes, along with better-annotated figures showing changes in wing shape and venation, scale coloration, and color pattern transformation (e.g. posterior elongation of the orange marginal stripes).

      There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In Drosophila mirror, araucan, and caupolican comprise the so-called Iroqouis Complex of genes. As denoted in Figure S4 and in Kerner et al (doi: https://doi.org/10.1186/1471-2148-9-74) the divergence of araucan and caupolican into two separate paralogs is restricted to Drosophila. As in most insects, butterflies have only two Iroquois Complex genes: araucan and mirror. We tested the role of araucan in Junonia coenia as shown in our pre-print: https://doi.org/10.1101/2023.11.21.568172. Its expression appears to be restricted to early pupal wings where it is transcribed in all scale-forming cells. Mosaic araucan KOs resulted in a change in scale iridescent coloration associated with changes in the laminar thickness of scale cells.  

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      We thank the reviewer for the proposal and agree that a future study comparing Dpp in wild-type versus mirror KO butterflies would be useful to clarify the mechanism of Dpp signalling in wing development. It is not clear, however, that the results of a Dpp experiment would change the conclusions of our current study therefore we decided not to undertake these additional experiments for our revision.

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. 

      We respect that the reviewer does not have a strong interest in the comparative aspects of this study. Fair enough. This report is primarily aimed at biologists interested in the evolutionary history of insect wings.

      Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      A domain is “a region distinctively marked by some physical feature”. This term is used extensively in the developmental biology literature (e.g. “expression domain”, “embryonic domain”, “tissue domain”, “domain specification”) and is found throughout popular textbooks (e.g. Alberts et al. “The Cell”, Gilbert “Developmental Biology”). We prefer the term “domain” because of its association in the Drosophila literature with transcription factors that define fields of cells. We specifically avoided using the term “compartment” because of its association with cell lineage, which we have not tested. 

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      We would like the very same thing, of course, and we hope the reviewer finds our revised manuscript to be more satisfying to read.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      We thank the reviewer for this suggestion. However, our experiments have little to contribute to the topic of cell lineage compartmentalization. We have therefore opted to avoid speculating on this topic to prevent confusion and to keep the manuscript focused on our experimental results.

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

      We thank the reviewer for suggesting this important reference. We have included it in our revision.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good panspecies anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

      We thank the reviewer for their summary, critique, and recommendations. We agree with everything the reviewer says. Honestly, however, we were surprised by these comments because we took great care in the paper to never refer to mirror as a compartmentalization gene or claim it has a function in cell lineage compartmentalization like engrailed. As pointed out, we lack clonal analyses to test for compartmentalization. This is why we used the term “domain” instead of “compartment” in the title and throughout the manuscript. Nevertheless, we have recrafted the discussion in the manuscript, including completely removing the term “compartment”, to better avoid implications that mirror plays a role in cell lineage compartmentalization. 

      We also thank the reviewer for recommending the paper about the role of mirror in eye development. For the sake of keeping the paper focused, however, we decided not to broach the topic of mirror functions outside the context of wing development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have minor comments for improvement.

      The abstract and introductions are terminologically problematic when they refer to the concept of compartment and compartment boundaries. Allegedly this confusion has previously propagated in several articles related to butterfly wing development, which keeps alienating this literature from being taken seriously by fly specialists, for example. So it is important to use the right terms. I will try to explain point by point here, but I would appreciate it if the authors could undertake a significant rewrite taking these comments into account. The authors use the terms compartment and compartment boundary. This has a very specific use in developmental genetics: mitotic clones never cross a boundary (or compartment). I think the authors can keep referring to the equivalent of the A-P boundary, which is situated somewhere between M1-M2 based on unpublished data from the Patel Lab, and is not very well defined (Engrailed expression moves a little bit during development in this area). Domain is a looser term and can be used more liberally to describe genetically defined regions.

      - "Classical morphological work subdivides insect wings into several distinct domains along the antero-posterior (AP) axis, each of which can evolve relatively independently." Yes. This concept of domain and individuation seems important. You could make a proposed link to selector genes here.

      - "There has been little molecular evidence, however, for AP subdivision beyond a single compartment boundary described from Drosophila melanogaster." Incorrect, and this conflates "domain" and "compartment".

      Flies have wing AP domains too, that pattern their veins (see the cited Banerjee et al). 

      - "Our results confirm that insect wings can have more than one posterior developmental domain, and support models of how selector genes may facilitate evolutionarily individuation of distinct AP domains in insect wings". Yes, and I like the second part of the sentence. Still, I would recommend simply deleting "confirm that insect wings can have more than one posterior developmental domain, and" because this is neglecting previous work on AP genetic regionalization in both flies (vein literature) and butterflies (e.g. McKenna and Nijhout, Banerjee et al).

      - "Analyses of wing pattern diversity across butterflies, considering both natural variation and genetic mutants, suggest that wings can be subdivided into at least five AP domains, bounded by the M1, M3, Cu2, and 2A veins respectively, within each of which there are strong correlations in color pattern variation and wing morphology (Figure 1A)". Yes, and I would recommend emphasizing they correspond to welldefined gene expression domains as mentioned in Banerjee et al, or McKenna and Nijhout.

      - "The anterior-most of these domains, bordered by the M1 vein, appears to correspond to an AP compartment boundary originally described by cell lineage tracing in Drosophila melanogaster, and later supported in butterfly wings by expression of the Engrailed transcription factor. Interestingly, however, D. melanogaster work has yet to reveal clear evidence for additional AP domain boundaries in the wing." Confusingly, because the first sentence is about compartments while the second is about AP domains. I also think the claim that Dmel has no other known AP domains is dubious because Spalt is highly regionalized in flies.

      - "Previous authors have proposed the existence of such individuated domains, and speculated that they may be specified by selector genes.5,10 Our data provide experimental support for this model, and now motivate us to identify factors that specify other domain boundaries between the M1 and A2 veins." Yes, I completely agree with this way to emphasize the selector effect, and to link it to the concept of "individuated domain"

      We cannot thank the reviewer enough for the time and thought they devoted to giving helpful suggestions to improve our manuscript. We have applied all of the above recommendations to the revision.

      Fig. S1: the field needs to move away from Red/Green microscopy images, for accessibility reasons.

      The easiest fix here would be to change the red channels to magenta.

      Green/Magenta provides excellent contrast and accessibility in general in 2-channel images.

      We thank the reviewer for this suggestion. We have improved the color accessibility of Fig. S1.

    1. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

      definition memex

    2. One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail,

      One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kv2 subfamily potassium channels contribute to delayed rectifier currents in virtually all mammalian neurons and are encoded by two distinct types of subunits: Kv2 alpha subunits that have the capacity to form homomeric channels (Kv2.1 and Kv2.2), and KvS or silent subunits (Kv5,6,8.9) that can assemble with Kv2.1 or Kv2.2 to form heteromeric channels with novel biophysical properties. Many neurons express both types of subunits and therefore have the capacity to make both homomeric Kv2 channels and heteromeric Kv2/KvS channels. Determining the contributions of each of these channel types to native potassium currents has been very difficult because the differences in biophysical properties are modest and there are no Kv2/KvS-specific pharmacological tools. The authors set out to design a strategy to separate Kv2 and Kv2/KvS currents in native neurons based on their observation that Kv2/KvS channels have little sensitivity to the Kv2 pore blocker RY785 but are blocked by the Kv2 VSD blocker GxTx. They clearly demonstrate that Kv2/KvS currents can be differentiated from Kv2 currents in native neurons using a two-step strategy to first selectively block Kv2 with RY785, and then block both with GxTx. The manuscript is beautifully written; takes a very complex problem and strategy and breaks it down so both channel experts and the broad neuroscience community can understand it.

      Strengths:

      The compounds the authors use are highly selective and unlikely to have significant confounding cross-reactivity to other channel types. The authors provide strong evidence that all Kv2/KvS channels are resistant to RY785. This is a strength of the strategy - it can likely identify Kv2/KvS channels containing any of the 10 mammalian KvS subunits and thus be used as a general reagent on all types of neurons. The limitation then of course is that it can't differentiate the subtypes, but at this stage, the field really just needs to know how much Kv2/KvS channels contribute to native currents and this strategy provides a sound way to do so.

      Weaknesses:

      The authors are very clear about the limitations of their strategy, the most important of which is that they can't differentiate different subunit combinations of Kv2/KvS heteromers. This study is meant to be a start to understanding the roles of Kv2/KvS channels in vivo. As such, this is a minor weakness, far outweighed by the potential of the strategy to move the field through a roadblock that has existed since its inception.

      The study accomplishes exactly what it set out to do: provide a means to determine the relative contributions of homomeric Kv2 and heteromeric Kv2/KvS channels to native delayed rectifier K+ currents in neurons. It also does a fabulous job laying out the case for why this is important to do.

      Reviewer #2 (Public Review):

      Summary:

      Silent Kv subunits and the channels containing these Kv subunits (Kv2/KvS heteromers) are in the process of discovery. It is believed that these channels fine-tune the voltage-activated K+ currents that repolarize the membrane potential during action potentials, with a direct effect on cell excitability, mostly by determining action potentials firing frequency.

      Strengths:

      What makes silent Kv subunits even more important is that, by being expressed in specific tissues and cell types, different silent Kv subunits may have the ability to fine-tune the delayed rectifying voltage-activated K+ currents that are one of the currents that crucially determine cell excitability in these cells. The present manuscript introduces a pharmacological method to dissect the voltage-activated K+ currents mediated by Kv2/KvS heteromers as a means of starting to unveil their importance, together with Kv2-only channels, to the cells where they are expressed.

      Weaknesses:

      While the method is effective in quantifying these currents in any isolated cell under an electric voltage clamp, it is ineffective as a modulating maneuver to perhaps address these currents in an in vivo experimental setting. This is an important point but is not a claim made by the authors.

      We agree. We have now stated in the introduction that this study does not address the roles of Kv2/KvS currents in an in vivo setting.

      Manuscript revisions:

      While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.  

      There are other caveats with the methods and data:

      (i) The need for a 'cocktail' of blockers to supposedly isolate Kv2 homomers and Kv2/KvS heteromers' currents from others may introduce errors in the quantification Kv2/KvS heteromers-mediated K+ currents and that is due to possible blockers off targets.

      We now point out that is possible that off target effects of blockers may introduce errors, include references that identify the selectivity of the blockers used in the cocktail, and specifically note that 4-aminopyridine in the cocktail is expected to block 2% of Kv2 homomers yet have a lesser impact Kv2/KvS heteromers. Additionally, to test whether the KvS isolation strategy requires the cocktail in neurons, we performed new experiments on a different subclass of nociceptors without the blocker cocktail and identified a substantial KvS-like component (new Fig 7 Supplement 3).

      Manuscript revisions:

      “After whole-cell voltage clamp was established, non-Kv2/KvS conductances were suppressed by changing to an external solution containing a cocktail of inhibitors: 100 nM alpha-dendrotoxin (Alomone) to block Kv1 (Harvey and Robertson, 2004), 3 μM AmmTX3 (Alomone) to block Kv4 (Maffie et al., 2013; Pathak et al., 2016), 100 μM 4-aminopyridine to block Kv3 (Coetzee et al., 1999; Gutman et al., 2005), 1 μM TTX to block TTX sensitive Nav channels, and 10 μM A803467 (Tocris) to block Nav1.8 (Jarvis et al., 2007). It is possible that off target effects of blockers may introduce errors in the quantification Kv2/KvS heteromer-mediated K<sup>+</sup> currents. For example, 4-aminopyridine is expected to block a small fraction, 2%, of Kv2 homomers and have a lesser impact on Kv2/KvS heteromers (Post et al., 1996; Thorneloe and Nelson, 2003; Stas et al., 2015) which could result in a slight overestimation of the ratio of Kv2/KvS heteromers to Kv2 homomers.”

      “We also tested the other major mouse C-fiber nociceptor population, peptidergic nociceptors, to determine if this subpopulation also has conductances resistant to RY785 yet sensitive to GxTX. We voltage clamped DRG neurons from a CGRP<sup>GFP</sup> mouse line that expresses GFP in peptidergic nociceptors (Gong et al., 2003). Deep sequencing has identified mRNA transcripts for Kv6.2, Kv6.3, Kv8.1 and Kv9.3 present in GFP+ neurons, an overlapping but distinct set of KvS subunits from the Mrgprd<sup>GFP</sup> non-peptidergic population (Zheng et al., 2019). In GFP+ neurons from CGRP<sup>GFP</sup> mice, we found that a fraction of outward current was inhibited by 1 µM RY785 and additional current inhibited by 100 nM GxTX (Fig 7 Supplement 3 A-C). In these experiments, 58 ± 2% (mean ± SEM) was KvS-like (Fig 7 Supplement 3 D) identifying that KvSlike conductances are present in these peptidergic nociceptors. For CGRP<sup>GFP</sup> neurons we did not include the Kv1, Kv3, Kv4, Nav and Cav channel inhibitor cocktail used for other neuron experiments, indicating that the cocktail of inhibitors is not required to identify KvS-like conductances.”

      (ii) During the electrophysiology experiments, the authors use a holding potential that is not as negative as it is needed for the recording of the full population of the Kv2/KvS channels. Depolarized holding potentials lead to a certain level of inactivation of the channels, that vary according to the KvS involved/present in that specific population of channels. As a reminder, some KvS promote inactivation and others prevent inactivation. Therefore, the data must be interpreted as such.

      We agree. We now point out that the physiological holding potentials used are insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. We also note that the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.

      Manuscript revisions:

      “Neurons were held at a membrane potential of –74 mV to mimic a physiological resting potential. KvS subunits can profoundly shift the voltage-inactivation relation (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and this potential is likely insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. Also, the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (iii) The analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. Also, in dealing with a heterogenous population of Kv2/KvS heteromers, heterogenous K+ conductance deactivation kinetics is a must. Indeed, different KvS may significantly relate to different deactivation kinetics as well.

      We now discuss that the bi-exponential fit of tail currents is likely inadequate to capture the deactivation kinetics of all underlying components of a heterogenous population of Kv2/KvS heteromers.

      Manuscript revisions:

      “We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (iv) Silent Kv subunits may be retained in the ER, in heterologous systems like CHO cells. This aspect may subestimate their expression in these systems. Nevertheless, the authors show similar data in CHO cells and in primary neurons.

      We agree. We now note that in heterologous systems, including CHO cells, transfection of KvS subunits can result in KvS subunits that are retained intracellularly.

      Manuscript revisions:

      “While a fraction of KvS subunits appear to be retained intracellularly, immunofluorescence for Kv5.1, Kv9.3 and Kv2.1 also appeared localized to the perimeter of transfected Kv2.1-CHO cells (Figure 1 Supplement).”

      (v) The hallmark of silent Kv subunits is their effect on the time inactivation of K+ currents. As such, data should be shown throughout, preferably, from this perspective, but it was only done so in Figure 4G.

      Indeed, effects on inactivation are a hallmark of KvS subunits. However, quantifying inactivation of Kv2/KvS channels requires steps to positive voltages for approximately 10 seconds. In neurons steps this long usually resulted in irreversible changes in leak currents/input resistance that degraded the accuracy of RY785/GxTX subtraction currents. Consequently, we did not acquire inactivation data in neurons, and we now explain in the manuscript why such data was not obtained.

      Manuscript revisions:

      “While changes in inactivation are prominent with KvS subunits, we did not investigate inactivation in neurons because the lengthy depolarizations required often resulted in irreversible leak current increases that degraded the accuracy of RY785/GxTX subtraction current quantification.”

      (vi) Functional characterization of currents only, as suggested by the authors as a bona fide of Kv2 and Kv2/KvS currents, should not be solely trusted to classify the currents and their channel mediators.

      We agree, and now state explicitly that functional characterization cannot be trusted to classify their channel mediators of conductances, and we try to be clear about this throughout the manuscript by using soft terms such as "KvS-like" when identity is uncertain.

      Manuscript revisions:

      “As functional characterization alone cannot be trusted to classify their channel mediators of conductances, we define conductances consistent with Kv2/KvS heteromers as 'KvS-like' and conductances consistent with Kv2 homomers as 'Kv2-like'.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There is not a lot to do here - this was a real pleasure to read and very easy to understand, as written. Here are a few minor things to consider:

      (1) The naming of the KvS subunits has always been confusing - it is not clear that Kv5,6,8,9 are members of the Kv2 subfamily from the names. KvS does a good job of differentiating them by assembly phenotype and has been used a lot in the literature, but it doesn't solve the misconception of what subfamily they belong to. This might not matter so much for mammals, where all KvS channels are in the Kv2 subfamily, but it makes it impossible to extend the naming system to other animals where subunits requiring heteromeric assembly are common in most subfamilies. How about trying the name Kv2S? It would have continuity with KvS in the reader's mind, make it clear that they are Kv2 subfamily, and make a naming system that could be extended beyond vertebrates. This is not a problem the authors created - just a completely optional suggestion on how to solve it if so inclined.

      We agree that naming conventions for these subunits are problematic, and agonized quite a bit about nomenclature. In the end we chose to stick with the precedent of KvS.

      (2) Another naming issue they should definitely change is the use of "subfamily" for the different KvS subtypes (Kv5, Kv6, Kv8, and Kv9). This really creates confusion with the higher-order subfamilies that have a very clear functional definition: a subfamily of Kv genes is a group of related genes that have assembly compatibility. Those are Kv1, Kv2, Kv3 and Kv4. KvS genes are assembly compatible with Kv2, evolutionarily derived from the Kv2 lineage, and thus clearly a part of the Kv2 subfamily. Using a subfamily for the next lower level of the naming hierarchy confuses this. The authors should use different terms like sub-type or class or subgroups for the divisions within KvS.

      Thank you. We have standardized to Kv2/KvS as a subfamily; Kv5, Kv6, Kv8, and Kv9 as subtypes; and individual proteins, e.g. Kv8.1, as subunits.

      (3) When you discuss whether the KvS subunit directly disrupts Ry785 binding in the pore or works allosterically and you said you know which KvS residues point into the pore from models, I thought that maybe you could tell from a sequence alignment whether the KvS channels you didn't test look the same in the conduction pathway as the ones you did test. If so, you could mention that if the binding site is the pore, they should all be resistant. Alternatively, if one you didn't test looks fundamentally more similar to the Kv2s in this region, then maybe it could be fingered as a possible exception that needs to be tested later.

      Great ideas. We now assess sequence KvS variability near the proposed RY785 binding site in all KvS subunits. We generated structural models of RY785 docking to Kv2.1 and Kv2.1/Kv8.1 and found that residues near RY785 are different in all KvS subunits.

      Manuscript revisions:

      “We analyzed computational structural models of RY785 docked to a Kv2.1 homomer and a 3:1 Kv2.1:Kv8.1 heteromer (Fig 9) to gain structural insight into how KvS subunits might interfere with RY785 binding. We used Rosetta to dock RY785 to a cryo-EM structure of a Kv2.1 homomer in an apparently open state (Fernández-Mariño et al., 2023). The top-scoring docking pose has RY785 positioned below the selectivity filter and off-axis of the pore (Fig 9 A), similar to a stable pose observed in molecular dynamic simulations (Zhang et al., 2024). In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open.”

      (4) Future suggestion or tip - not for this paper. Your data shows your isolation strategy works really well on Kv6 channels, and these are also the Kv2/KvS channels that have the most pronounced biophysical changes. Working on neurons that have a prominent Kv2/Kv6 component would really show how well the strategy outlined here works to describe the physiology of native neurons. The highest KvS expression I have seen in public data in a wellstudied cell type is Kv6.4 in spinal motor neurons.

      Wonderful tip, thank you. We are indeed very interested in Kv6.4 in spinal motor neurons.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript makes a good contribution to the identification of Kv2/KvS channels in primary cells. The pharmacological method proposed by the authors to dissect the currents in an experimental setting seems proper. Although meritorious in themselves, the findings are heavily phenomenological in the opinion of this reviewer. The manuscript should be improved with some level of mechanistic data and/or the demonstration of different levels of expression in different cell types.

      Thank you for the suggestions. This manuscript now demonstrates strikingly higher levels of the KvS-like component of Kv2 currents in somatosensory (DRG nonpeptidergic and peptidergic nociceptor) versus autonomic (SCG) neuron types. The mechanistic question of what electrophysiological properties the KvS subunits are providing to the neuronal circuit is an exciting one that we are pursuing separately.

      Manuscript revisions:

      “While we found only RY785-sensitive Kv2-like conductances in SCG neurons, Kv2/KvS heteromer-like conductances were dominant in DRG neurons.”

      At present, the manuscript says that the combination of RY785 and guangxitoxin-1E can be used to define Kv2/KvS-mediated K+ currents. Importantly, this method cannot be used in a way that one can functionally determine the function of Kv2/KvS channels, since it depends on the pre-blocking of Kv2-mediated K+ currents prior. In the opinion of this reviewer, this fact decreases the attention of a potential reader.

      Indeed, our study is focused on revealing KvS heteromers by voltage clamp, and we now clarify in the introduction that we do not determine the function of Kv2/KvS channels in this study, so as not to lead the reader to expect studies of neuronal signaling.

      However, the selective pharmacology we identify suggests RY785 application could reveal the function of Kv2 homomers, and for RY785-insensitive signaling, GxTX application of could reveal the function of Kv2/KvS heteromers. We now mention these possible applications in the Discussion.

      Manuscript revisions:

      “While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.”

      Please find below suggestions for improving the manuscript:

      (1) The term "Kv2/KvS heteromers" should be used throughout instead of variations such as "Kv2/KvS channels", "Kv2/KvS" and others. Standardization of the term to refer to heteromers would make the manuscript easier to read.

      Thank you. We have standardized terms to consistently refer to Kv2/KvS heteromers.

      (2) Confusing terms like KvS conductances, KvS-like conductances, KvS-like (RY785-resistant, GxTX-sensitive) currents, and KvS channels should be avoided because they disregard the current belief that KvS cannot form functional homomeric channels. The term KvS-containing channels, and Kv2/KvS channels, seem more accurate. Uniformization in this regard will also make the manuscript more easily readable.

      Thank you. We have standardized terms to Kv2/KvS heteromers and KvS-containing channels when channel subunits are known and the use terms KvS-like and Kv2-like for functionally identified endogenous conductances with unknown channel subunits.

      (3) Referring to KvS as a regulatory subunit is inaccurate. It is clear that KvS is part of, and it makes up the alpha pore. KvS therefore is a part of the conductive pathway and not a regulatory (suggesting accessory) subunit. KvS take part in selectivity filter (fully conserved), but they also make up an important part of the conducting pathway with non-conserved amino acid residues.

      We felt it important to include the descriptor “regulatory” to connect our nomenclature with prior use of the descriptor in the literature, and now only use the term at the start of the introduction.

      Manuscript revisions:

      “A potential source of molecular diversity for Kv2 channels are a group of Kv2-related proteins which have been referred to as regulatory, silent, or KvS subunits.”

      (4) The use of a cocktail of channel inhibitors may affect the quantification of Kv2/KvS heteromers-mediated K+ currents because they may interact with RY785 and/or GxTx or they may even interact with the sites for these two drugs on Kv2-containing channels.

      This is an interesting point worth considering, thank you. We now alert readers to this possibility in the discussion when considering the limitations of our approach.

      Manuscript revisions:

      “Also, the cocktail of inhibitors used in most neuron experiments here could potentially alter RY785 or GxTX action against KvS/Kv2 channels.”

      (5) The graphical representation of fractional blocking and other parameters (e.g., Fig 1D), is hard to read in these slim plots. In my opinion, tall bars would be more meaningfully visualized.

      Thank you for pointing out that the graphs were hard to read, we have made the graph easier to read and added tall bars.

      (6) Vehicle control for IHC and electrophysiology. Please state what is the vehicle used in the electrophysiology experiments.

      Thank you. The composition of vehicle has now been stated in the methods.

      Manuscript revisions:

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      “Sections were incubated in vehicle solution (4% milk, 0.2% triton diluted in PB) for 1 hr at RT.”

      (7) The reference Trapani & Korn, 2003 (?) is not included in the list. This reference is important since it sets what are the Kv2.1-CHO cells. In this regard it is also important to mention, even better to address, the expressing qualities of this system in the face of a co-expression with a plasmid-based expression of silent Kv subunits. Are these two ways of expressing Kv subunits, meant to come together (or not) in heteromers, balanced? This question is critical here. Still, in regard to Kv2.1-CHO cells, it was not clear in the manuscript if the term "transfection" refers only to the plasmids used to temporarily induce the expression of silent Kv subunits and potentially Kv channels accessory subunits.

      We now include the Trapani & Korn, 2003 reference (thank you for pointing out this accidental omission), and better explain expression methods. The benefit of the inducible Kv2.1 expression is control of Kv conductance densities which can otherwise become so large as to be refractory to voltage clamp. The beauty of the expression system is that cells recently transfected with KvS subunits can be induced to express just enough Kv2.1 to get a substantial but not clampoverwhelming RY785-resistant Kv2/KvS conductance. We also discuss that our expression methods are distinct from past studies. We stop short of comparing the expression systems, as this is beyond the scope of what we set out to study.

      Manuscript revisions: See next response

      (8) Kv2.1-CHO cells transfection procedures, induction, and validation are unclear. This validation is important here.

      We have clarified transfection procedures, induction, and validation in the methods section.

      Manuscript revisions:

      “The CHO-K1 cell line transfected with a tetracycline-inducible rat Kv2.1 construct (Kv2.1-CHO) (Trapani and Korn, 2003) was cultured as described previously (Tilley et al., 2014).”

      Transfections were achieved with Lipofectamine 3000 (Life Technologies, L3000001). 1 μl Lipofectamine was diluted, mixed, and incubated in 25 μl of Opti-MEM (Gibco, 31985062).”

      “Concurrently, 0.5 μg of KvS or AMIGO1 or Navβ2, 0.5 μg of pEGFP, 2 μl of P3000 reagent and 25 μl of Opti-MEM were mixed. DNA and Lipofectamine 3000 mixtures were mixed and incubated at room temperature for 15 min. This transfection cocktail was added to 1 ml of culture media in a 24 well cell culture dish containing Kv2.1-CHO cells and incubated at 37 °C in 5% CO2 for 6 h before the media was replaced. Immediately after media was replaced, Kv2.1 expression was induced in Kv2.1-CHO cells with 1 μg/ml minocycline (Enzo Life Sciences, ALX380-109-M050), prepared in 70% ethanol at 2 mg/ml. Voltage clamp recordings were performed 12-24 hours later. We note that the expression method of Kv2/KvS heteromers used here is distinct from previous studies which show that the KvS:Kv2 mRNA ratio can affect the expression of functional Kv2/KvS heteromers (Salinas et al., 1997b; Pisupati et al., 2018). We validated the functional Kv2/KvS heteromer expression using voltage clamp to establish distinct channel kinetics and the presence of RY785-resistant conductance in KvS-transfected cells and using immunohistochemistry to label apparent surface localization of KvS subunits (Figure 4, Figure 1 Supplement, Figure 1 and Figure 5).”

      (9) It is important for readers to add some context to Kv2.1/Kv8.1 channels (and other Kv2/KvS heteromers) used to test the combination of RY785 and GxTx. In my opinion, this enriches the discussion.

      Good idea. We have added context about each of the KvS subunits we test.

      Manuscript revisions:

      “To test the pharmacological response of KvS we began with Kv8.1, a subunit that creates heteromers with biophysical properties distinct from Kv2 homomers (Salinas et al., 1997a), and modulates motor neuron vulnerability to cell death (Huang et al., 2024).

      Each of these KvS subunits create Kv2/KvS heteromers that have distinct biophysical properties (Kramer et al., 1998; Kerschensteiner and Stocker, 1999; Bocksteins et al., 2012). Kv5.1/Kv2.1 heteromers play an important role in controlling the excitability of mouse urinary bladder smooth muscle (Malysz and Petkov, 2020), mutations in Kv6.4 have been shown to influence human labor pain (Lee et al., 2020b), and deficiency of Kv9.3 disrupts parvalbumin interneuron physiology in mouse prefrontal cortex (Miyamae et al., 2021).”

      (10) In general, the membrane potential used to activate Kv2 only channels and Kv2/KvS channels is too close to the activation V1/2. In case the comparing curves are displaced in their relative voltage dependence and voltage sensitivity, using that range of membrane potential may introduce a crucial error in the estimation of the conductance's relative amplitudes.

      We now note that the relative conductances of Kv2-only vs Kv2/KvS channels are expected to vary with voltage protocol, as KvS inclusion results in channels with altered voltage responses.

      Manuscript revisions:

      “…the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (11) The use of tail currents to estimate conductance is problematic if i) lack of current inactivation is not assured, and ii) if the different currents, with possible different deactivation kinetics at the used membrane potential (e.g., mV), are not assured. Why was the activation peak used at times, and at different elapsed times the tail currents were used instead? These aspects of conductance's amplitude estimation methods should be well defined.

      In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We have clarified this analysis in the methods section.

      Manuscript revisions:

      “In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. In neurons, voltage gated currents remained in the toxin cocktail + RY785 and GxTX, that were sometimes unstable. To minimize complications from these currents, we restricted analysis of RY785 and GxTX subtraction experiments to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (12) Were the experiments including different conditions such as control, RY, and RY+GxTx done pair-wised? This could potentially better the statistics and strengthen the data and the conclusions drawn from them.

      The control, RY, and RY+GxTX in neurons were done pairwise and the statistical tests performed for these experiments were pairwise tests. We have clarified this in the figure legends.

      Manuscript revisions:

      “Wilcoxon rank tests were paired, except the comparison of RY785 to vehicle which was unpaired.”

      (13) The holding potential of the experiments, mostly -89 mV, may be biasing the estimation of Kv2 only channels vs. Kv2/KvS channels conductances. Figure 4I exemplifies this concern.

      We agree. Figure 4I reveals that a holding potential of -89 mV vs -129 mV reduces conductance of Kv2.1/Kv8.1 heteromers vs Kv2.1 homomers in CHO cells by ~20%. We have now alerted readers that the ratio of Kv2 only channels vs. Kv2/KvS conductances can vary with holding voltage.

      Manuscript revisions:

      “Under these conditions, 58 ± 3 % (mean ± SEM) of the delayed rectifier conductance was resistant to RY785 yet sensitive to GxTX (KvS-like) (Fig 7 F). We note that the ratio of KvS- to Kv2-like conductances is expected to vary with holding potential, as KvS subunits can change the degree and voltage-dependence of steady state inactivation (e.g. Fig 4I).”

      (14) It is possible that Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are the same, by mistake, since their noise pattern looks too similar.

      Indeed the noise pattern of the Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are related, as they have inputs from the same trace, with Figure 6C ("Kv2-like" trace) being a subtraction of Figure 6A (+RY trace) from Figure 6A (control trace).

      (15) For example, in Figure 7A, what is the identity of the current remaining after the RY+GxTx application? In Figure 7B, a supposed outlier in the group of data referring to "veh" in the right panel is what possibly is making this group different from +RY in the left panel (p=0.02, Wilcoxon rank test). I would recommend parametric tests only since the data is essentially quantitative.

      In Figure 7A, we do not know the identity of the current remaining after the RY+GxTX application, the kinetics of the residual current appeared distinct from the Kv2/KvS-like currents blocked by RY or GxTX, but we did not analyze these.

      The date in Figure 7B, was indeed the positive outlier in the group of data referring to "veh" in the right panel and contributes to the p-value, but we saw no reason to exclude it. We have now replaced the representative trace in 7B with a non-outlier trace. We respectfully disagree with the suggestion to use parametric statistical tests as we do not know the distribution underlying the variance our data.

      Manuscript revisions:

      “Subsequent application of 100 nM GxTX decreased tail currents by 68 ± 5% (mean ± SEM) of their original amplitude before RY785. We do not know the identity of the outward current that remains in the cocktail of inhibitors + RY785 + GxTX.”

      (16) Please state the importance of using nonpeptidergic neurons to study silent Kv5.1 and Kv9.1 subunits. RNA data may not necessarily work to probe function or protein abundance, which is crucial in heteromeric complexes.

      We have now more thoroughly explained our rationale for choosing the nonpeptidergic neurons.

      RNA is not predictive of protein abundance, and we have not yet been successful in measuring KvS protein abundance in these neurons, so we've probed KvS abundance by assessing RY785 resistance.

      Manuscript revisions:

      “Mouse dorsal root ganglion (DRG) somatosensory neurons express Kv2 proteins (Stewart et al., 2024), have GxTX-sensitive conductances (Zheng et al., 2019), and express a variety of KvS transcripts (Bocksteins et al., 2009; Zheng et al., 2019), yet transcript abundance does not necessarily correlate with functional protein abundance. To record from a consistent subpopulation of mouse somatosensory neurons which has been shown to contain GxTXsensitive currents and have abundant expression of KvS mRNA transcripts (Zheng et al., 2019), we used a Mrgprd<sup>GFP</sup> transgenic mouse line which expresses GFP in nonpeptidergic nociceptors (Zylka et al., 2005; Zheng et al., 2019). Deep sequencing identified that mRNA transcripts for Kv5.1, Kv6.2, Kv6.3, and Kv9.1 are present in GFP+ neurons of this mouse line (Zheng et al., 2019) and we confirmed the presence of Kv5.1 and Kv9.1 transcripts in GFP+ neurons from Mrgprd<sup>GFP</sup> mice using RNAscope (Fig 7 Supplement 1).”

      (17) In Figure 8B, were +RY data different from veh data? The figure shows no Wilcoxon (nonparametric) comparison and this is important to be stated. What conductance(s) is the vehicle solution blocking or promoting? What is RY dissolved in, DMSO? What is the DMSO final concentration?

      We now state that in Figure 8B, +RY amplitudes were not statistically different from veh data in this limited data set. However, the RY-subtraction currents always had Kv2-like biophysical properties, whereas vehicle-subtraction currents had variable properties precluding biophysical analysis for Fig 8D.

      In Figure 8B, we do not know what conductance(s) the vehicle solution is affecting, we think the changes observed are likely merely time dependent or due to the solution exchange itself. RY stock is in DMSO. All recording solutions have 0.1% DMSO final concentration, this is now noted in methods.

      Manuscript revisions:

      “Unlike mouse neurons, we did not detect a significant difference in tail currents of RY785 versus vehicle controls. However, RY785-subtracted currents always had Kv2-like biophysical properties whereas vehicle-subtraction currents had variable properties that precluded the same biophysical analysis. Overall, these results show that human DRG neurons can produce endogenous voltage-gated currents with pharmacology and gating consistent with Kv2/KvS heteromeric channels.”

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      (18) METHODS. The electrophysiology approach should be unified in all aspects as applicable and possible.

      We have unified the mouse dorsal root ganglion and mouse superior cervical ganglion methods sections. We have kept CHO cells and mouse/human neurons section separate because the methods were substantially different.

      (19) DISCUSSION. The discussion section spends half of its space trying to elaborate on possible blocking/inhibiting/modulating mechanisms for RY785. The present manuscript shows no data, at least not that I have noticed, that would evoke such discussion.

      We have shortened this section, and enhance the discussion with structural models (new Fig 9), and our functional data indicating perturbed RY785 interaction with Kv2.1/8.1.

      Manuscript revisions:

      “In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open. However, the RY785 resistance of Kv2/KvS heteromers may primarily arise from perturbed interactions with the constricted central cavity of closed channels. In homomeric Kv2.1, RY785 becomes trapped in closed channels and prevents their voltage sensors from fully activating, indicating that RY785 must interact differently with closed channels (Marquis and Sack, 2022). Here we found that Kv2.1/Kv8.1 current rapidly recovers following washout of RY785, suggesting that Kv2.1/Kv8.1 heteromers do not readily trap RY785 (Figure 2 Supplement). Overall, the structural modeling suggests that KvS subunits sterically interfere with RY785 binding to the central cavity, while functional data suggest KvS subunits disrupt RY785 trapping in closed states.”

      (20) DISCUSSION. Topics like ER retention and release upon certain conditions would be a better enrichment for the manuscript in my opinion.

      ER retention of KvS subunits is indeed an important topic! However, we have opted not to delve into it here.

      (21) DISCUSSION. Speculation about the binding site for RY on Kv2/KvS channels is also not touched by the data shown in the manuscript.

      We have shortened this section of discussion, and now present this with structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground speculations. See manuscript changes noted in response to comment (19) above.

      (22) DISCUSSION. An important reference is missing in regard to stoichiometry: Bocksteins et al., 2017. This work is the only one using a non-optical technique to add knowledge to that question.

      Good point, and an excellent study we didn’t realize we’d not included before. We now include Bocksteins et al. 2017 as a reference in the Introduction.

      (23) In my opinion, allosterism and orthosterism are concepts not yet useful for the discussion of RY binding sites without even a general piece of data.

      We now include structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground blocking speculations. See manuscript changes noted in response to comment (19).

      (24) The term "homogeneously susceptible" associated with a Hill slope close to 1 needs to be more elaborated.

      Thank you, we have elaborated.

      Manuscript revisions:

      “Also, the degree of resistance to RY785 may vary if Kv2:KvS subunit stoichiometry varies. With high doses of RY785, we found that the concentration-response characteristics of Kv2.1/Kv8.1 in CHO cells revealed hallmarks of a homogenous channel population with a Hill slope close to 1 (Fig 2B). However, other KvS subunits might assemble in multiple stoichiometries and result in pharmacologically-distinct heteromer populations.”

      (25) Stating the KvS are resistant to RY785 is not proper in my opinion. This opinion relates to the fact that the RY binding site in the channels is certainly not restricted to a binding site residing only on the Kv subunit.

      Good point. We have now changed phrasing to convey that KvS subunits are a component of a heteromer that imbues RY785 resistance.

      Manuscript revisions:

      “These results show that voltage-gated outward currents in cells transfected with members from each KvS subtype have decreased sensitivity to RY785 but remain sensitive to GxTX. While we did not test every KvS subunit, the ubiquitous resistance suggests that all KvS subunits may provide resistance to 1 μM RY785 yet remain sensitive to GxTX, and that RY785 resistance is a hallmark of KvS-containing channels.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel

      Technically sound

      Well-designed

      Thorough

      Weaknesses:

      There were no major weaknesses identified.

      Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

      Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

      Recommendations for the authors:

      The reviewers noted that they received comments in response to their concerns, and some improvements have been made to the manuscript. However, as described below, in some cases, a rebuttal was provided, but changes were not made to the manuscript. It is suggested that these issues be addressed to improve the quality of the manuscript.

      We thank the reviewers and editor for the assessment of the manuscript and recommendations for its improvement. We have addressed the remaining comments from reviewer #2 below, and hope that they find our revisions satisfactory.

      Reviewer #2 (Recommendations for the authors):

      The manuscript convincingly shows that MC4R in kisspeptin-producing cells can influence reproductive function. This suggests that fertility problems associated with melanocortin mutations are likely due to direct effects on the reproductive systems rather than simply being side effects of the resultant obesity.

      We are pleased that this reviewer finds the data convincing and thank them for the careful review of the manuscript, which has helped to improve its published version.

      The authors have responded to the reviewer's comments and made several improvements to the manuscript.

      The authors are correct in pointing out that the POMC-Cre animals should be fine for studies involving the administration of AAVs to adult animals. I have misinterpreted how these mice were being used, and this concern is fully addressed.

      Unfortunately, in some cases, the authors rebutted the reviewer's comments but did not change the manuscript. I suggest addressing several issues in the manuscript (after all, it is not the reviewer's opinion that counts; this process is about improving the manuscript).

      (1) Validation of the KO is insufficiently reported. From the methods, it appears that this was done thoroughly, but currently, only a single image of the arcuate nucleus is shown, and no image of the AVPV is shown. There is no quantitative information provided. The authors can keep these data as supplementary material, but they should be comprehensive and convincing, as so much depends on the degree of knockout in this model. One cannot assume complete KO based simply on the relevant genetics, as there are examples in this system where different Cre lines produce different outcomes with various floxed genes in the two major populations of kisspeptin neurons. This figure should show the quantitation of the RNAscope analysis from each of the two regions regarding the percentage of kisspeptin cells showing expression of MC4R mRNA. In addition, the lack of MC4 labelling in the arcuate nucleus, outside of kisspeptin neurons, is a concern. One would expect to see AgRP or POMC cells at this level, but are they still showing expression of MC4? A single image is insufficient to be convinced of the model's efficacy.

      We appreciate the reviewer’s concerns regarding the validation of the MC4RKO model. Below, we provide clarification and additional justification for our approach.

      (1) Quantification of MC4R in the Arcuate Nucleus (ARC): As noted by the reviewer, we were unable to detect sufficient MC4R signal in the ARC of KO mice to perform meaningful quantification. This is consistent with the expected outcome of a successful MC4R deletion. Given the low endogenous expression levels of MC4R in this region, even in control animals, and the technical limitations of RNAscope in detecting very low-abundance transcripts, especially for receptors, the absence of MC4R signal in the ARC of KO mice strongly supports effective deletion. Moreover, the MC4R loxP mouse has been published and validated by many labs including Brad Lowell’s lab who’s done extensive work using these mice for selective deletion of Mc4r from various neuronal populations such as Sim1 and Vglut2 neurons (Shah et al., 2014, de Souza Cordeiro et al., 2020). To further strengthen our validation, we provide additional images from another animal (Fig_S1) to illustrate the consistency of the MC4R KO in the ARC. These will be included as supplementary material, as suggested.Regarding AgRP and POMC neurons, MC4R is not highly expressed in these neurons (as per previous literature, e.g., Garfield et al., Nat Neurosci. 2015; Padilla SL et al, Endocrinology 2012; Henry et al, Nature, 2015). Instead, MC4R is predominantly found in downstream neurons in the paraventricular nucleus (PVN) and other hypothalamic regions (which is intact in our KO mice as shown in our validation figure). Thus, the absence of MC4R labeling in AgRP or POMC cells in our images aligns with known expression patterns and does not contradict the validity of our model.

      (2) MC4R Expression in the AVPV and OVX Effect on Kiss1 Expression: We acknowledge the reviewer’s request for MC4R expression analysis in the anteroventral periventricular nucleus (AVPV). However, due to the timing of tissue collection after ovariectomy (OVX), Kiss1 expression in the AVPV is significantly suppressed, making it technically unfeasible to perform co-staining of MC4R with Kiss1 in this region. This is a well-documented effect of estrogen depletion following OVX (Smith et al., 2005; Lehman et al., 2010). While we acknowledge that an ideal validation would include AVPV co-labeling, the experimental constraints related to OVX preclude this analysis in our dataset.

      Given these considerations and validations, we are confident that the KO is effective and specific.

      (2) Line 88: "... however, conflicting reports exist". Expand on this sentence to describe what these conflicting reports show. The authors responded to my comment but made no changes to the introduction. As a reader, I dislike being told there are conflicting reports, but then I have to go and look up the reference to see what that actual point of conflict is.

      By conflicting reports we meant that other studies have shown no association between MC4R and reproductive disorders, this has now been included in the revised manuscript (Line 89).

      (3) Could the authors explain how a decrease in AgRP would be interpreted as a "decrease in hypothalamic melanocortin tone" in line 142 and line 364? These overly simplistic interpretations of qPCR data detract from the overall quality of the paper.

      The reference to a decrease in melanocortin tone referred to the decrease in the expression of melanocortin receptor signaling, this has been clarified in the revised manuscript (lines 142 and 360).

      (4) Please show the individual cycle patterns for all animals, as in Figure 2B. This can be a supplemental figure, but the current bar charts are not informative.

      We respectfully disagree that the bar charts are not informative as they include the critical statistical analysis. We have now included all individual estrous cycle data in new separate supplemental figure (Sup. Figure 3). Therefore, we have excluded the representative cycles from the main figures as they are now in the new Supplemental. We have changed the orders of the figures in the text accordingly.

      (5) In their rebuttal, the authors state: "Mice lack true follicular and luteal phases, and therefore, it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate an LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult." I disagree, but the authors can take this position if they wish. However, they should not report the responses to exogenous estradiol in an ovariectomised mouse as a "preovulatory LH surge" (line 380). An ovariectomised mouse cannot ovulate, and the estrogen-induced LH surge is significantly different in magnitude and timing from the endogenous preovulatory LH surge (likely due to the actions of progesterone). One goal of these studies is to understand why the ovulation rate appears to be low in the MC4-KO animals. Hence, evaluating whether the preovulatory LH surge is typical is important. This has not been done. The authors have shown that the response to exogenous estradiol is sub-normal. Such an effect might lead to a reduced preovulatory LH surge, but this has not been measured.

      We appreciate this reviewer’s concern about the nature of the preovulatory LH surge. We have clarified this in the revised manuscript and described it as “an induced LH surge” throughout the text (Lines 163, 533, 6560).

      (6) I believe that the ovulation process should be considered "all or none," and I do not quite understand the rebuttal discussion. The authors describe that "numerous follicles mature at the same time....". That is not disputed. My point was that each mature follicle will receive the identical endocrine ovulatory signal (correct? Or do the authors believe something different?). If it were sufficient for one follicle to ovulate, then all of those mature follicles (the number of which will be variable between animals and between cycles) would be expected to undergo ovulation. The fact that they do not raise several possibilities. One that the authors favor is that an insufficient ovulatory signal might approach a threshold where some follicles ovulate and others do not. This possibility is supported by the apparent increase in cystic follicles, which might be preovulatory follicles that did not complete the ovulation process. Such variation might be stochastic, within normal variation for sensitivity to LH. However, it is also possible that the follicles have not matured at the same rate, perhaps influenced by abnormal secretion of LH or FSH during earlier phases of the cycle, and hence are not in the appropriate condition to respond to the ovulation signal when it arrives. Some may even have matured prematurely due to the elevated gonadotropins reported in this study. Given the data and the partial fertility, the most likely explanation is that the genetic manipulation has resulted in fewer follicles being available for ovulation due to changes in follicular development rather than a deficit of the ovulation signal, although the latter mechanism might also contribute. A third possibility is that genetic manipulation has directly affected the ovary. The authors did not answer whether Kiss1 and MC4 are co-expressed in the ovary. I think the authors might want to rule this out by showing no change in MC4R expression in the ovary.

      We thank the reviewer for this thoughtful comment and agree that these are possible outcomes. We have now acknowledged them in the Discussion.

      To answer the reviewer’s question, we have not investigated the co-expression of Kiss1 and Mc4r in the ovary. While MC4R has indeed been documented in the ovary (Chen et al. Reproduction, 2017), the changes in gonadotropin release and supporting in vitro data included in this manuscript clearly document a central effect, however, an additional effect at the level of the ovary cannot be completely ruled out. This has now been added to the discussion (Line 378-387).

      (7) Lines 390, 454 " impaired LH pulse" What was the evidence for impaired LH pulse (see figure 2D)?

      Thank you for pointing this out. This comment referred to augmented LH release. This has been corrected in the revised manuscript (Line 394).

      The paper's strengths remain, as outlined in my original review. The authors have addressed what I perceived to be weaknesses, predominantly by changing the tone of discussion and interpretation of the data. This is appropriate. I consider the focus on the LH surge as the primary mechanism too narrow, and the authors should be considering how other changes during the cycle might influence ovarian function.

      We sincerely appreciate the reviewer’s thoughtful evaluation of our manuscript and their constructive feedback. We are pleased that our revisions have addressed the perceived weaknesses and that the adjustments to the discussion and interpretation were deemed appropriate.

      We acknowledge the reviewer’s perspective on broadening the discussion beyond the LH surge to consider additional cycle-dependent influences on ovarian function. While our current study focuses on this specific mechanism, we recognize that ovarian function is influenced by multiple physiological changes throughout the cycle. We have refined our discussion to reflect this broader context and appreciate the suggestion to consider these additional factors in future studies.

      We have addressed all of the reviewer’s comments to the best of our ability and hope they find the revised manuscript satisfactory.

    1. we carefully consid-ered and addressed the question of reliance, and whateverone may think about the extent of the legitimate reliance inthat case, it is not in the same league as that present here. Abood had held that a public sector employer may requirenon-union members to pay a portion of the dues collected from union members.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, the authors perform a screen by feeding C. elegans different E. coli genetic mutants and examining the effect on the expression of fat-7, a stearoyl-CoA 9-desturase, which has been associated with longevity. They identify 26 E. coli strains that decrease fat-7 expression, all of which slow development and increase lifespan. RNA sequencing of worms treated with 4 of these strains identified genes involved in defense against oxidative stress among those genes that are commonly upregulated. Feeding C. elegans these 4 bacterial strains results in increased ROS and activation of the mitochondrial unfolded protein response, which appears to contribute to lifespan extension as these bacterial strains do not increase lifespan when the mitochondrial unfolded protein response transcription factor ATFS-1 is disrupted. Finally, the authors demonstrate a role for iron levels in mediating these phenotypes: iron supplementation inhibits the phenotypes caused by the identified bacterial strains, while iron chelation mimics these phenotypes. Response: We thank the reviewer for an excellent summary of our work.

      Major comments: The proposed model involves an increase in ROS levels activating the UPRmt and then leading to lifespan extension. If the elevation is ROS levels is contributing then treatment with antioxidants should prevent UPRmt activation and lifespan extension. Response: This is an excellent point. We will treat the FAT-7-suppressing diets with antioxidants and observe the effect on C. elegans UPRmt activation and lifespan.

      The authors suggest that iron depletion may disrupt iron-sulfur cluster proteins. The Rieske iron-sulfur protein ISP-1 from mitochondrial electron transport chain complex III has previously been associated with lifespan. Point mutations affecting the function of ISP-1 or RNAi decreasing the levels of ISP-1 both result in increased lifespan (PMID 20346072, 11709184). Thus, iron depletion may be increasing ROS, activating UPRmt and increasing lifespan through decreasing ISP-1 levels.

      Response: The reviewer has raised an intriguing possibility that the increased lifespan on the FAT-7-suppressing diets could be because of perturbation of ISP-1 function. While ISP-1 levels may not be directly affected by the mutant diets, ISP-1 function might be perturbed on these diets as ISP-1 function requires iron-sulfur clusters. Therefore, we will study the lifespan of isp-1(qm150) mutant on the FAT-7-suppressing diets to explore whether the lifespan extension on these diets is ISP-1 dependent.

      All of the Kaplan-meier survival plots are missing statistical analyses. Please add p-values.

      Response: The p-values for all the survival plots are included in the respective figure legends.

      It would be helpful to include a model diagram of the proposed mechanisms in the main figures.

      Response: We will make a model diagram after completing the experiments suggested by the reviewers.

      Minor comments: Rather than "mutant diets" it would be more informative to call these "FAT-7-decreasing diets"

      Response: We have changed “mutant diets” to “FAT-7-suppressing diets” throughout the manuscript.

      Is it surprising that none of the bacterial strains increased FAT-7 levels? Why do you think this is?

      Response: Yes, it was indeed surprising to find only bacterial strains that reduced FAT-7 levels and none that increased them. One possible explanation is that these bacterial mutants may not directly regulate fat-7 expression. Instead, they might alter the overall dietary composition, which is known to influence fat-7 levels. It appears that none of the tested mutants modified the diet in a manner that would lead to fat-7 upregulation.

      Page 5. "We hypothesized that diets reducing FAT-7 might elevate oleic acid levels". Since FAT-7 converts stearic acid to oleic acid, wouldn't deceasing FAT-7 levels decrease oleic acid levels and increase stearic acid levels?

      Response: FAT-7 expression is regulated by a feedback mechanism and is sensitive to the fatty acid composition within host cells; elevated levels of unsaturated fatty acids, such as oleic acid, suppress FAT-7 expression. There are two possible ways bacterial mutants could lead to reduced FAT-7 levels: (1) by directly inhibiting FAT-7 expression, which would be expected to result in increased stearic acid levels; or (2) by supplying higher amounts of oleic acid through their composition, thereby suppressing FAT-7 expression via feedback regulation. We focused on the second possibility, as elevated oleic acid levels—like those seen with FAT-7-suppressing diets—are known to promote C. elegans lifespan. To avoid confusion, we have revised the statement to: “We hypothesized that bacterial diets might reduce FAT-7 expression because they have elevated levels of oleic acid”.

      Page 6. The authors cite Bennett et al. 2014 for the statement that "Activation of the UPRmt has been associated with lifespan extension". This paper reaches the opposite conclusion "Activation of the mitochondrial unfolded protein response does not predict longevity in Caenorhabditis elegans". Also, in the Bennett paper and PMID 34585931, it is shown that constitutive activation of ATFS-1 decreases lifespan. Thus, the relationship between the UPRmt and lifespan is not straightforward. These points should be mentioned.

      Response: The reviewer has raised an important point. We have now included a paragraph in the discussion to highlight these points. The revised manuscript reads: “All 26 FAT-7-suppressing diets identified in our study elevated hsp-6p::GFP expression and extended C. elegans lifespan. Although UPRmt activation and lifespan extension were consistently observed across these diets, there was no strong correlation between hsp-6p::GFP levels and the degree of lifespan extension. The role of the UPRmt in promoting longevity remains controversial (Bennett et al., 2014; Soo et al., 2021; Wu et al., 2018). For instance, gain-of-function mutations in atfs-1 have been shown to reduce lifespan (Bennett et al., 2014; Soo et al., 2021). However, a recent study demonstrated that mild UPRmt activation can extend lifespan, whereas strong activation has the opposite effect (Di Pede et al., 2025). These findings suggest that UPRmt contributes to longevity only under specific conditions and at specific activation levels. In our study, lifespan extension on FAT-7-suppressing diets was dependent on ATFS-1, indicating that UPRmt activation was necessary for this effect.

      Page 6. "Our transcriptomic analysis suggested elevated ROS". Rather than refer to gene expression, it would be better to refer to the ROS measurements that were performed.

      Response: We have changed it to the following sentence: “Our ROS measurement analysis suggested elevated ROS levels in worms fed FAT-7-suppressing diets.

      The long-lived mitochondrial mutants isp-1 and nuo-6 have increased ROS, UPRmt activation and increased lifespan. Multiple studies have examined gene expression in these long-lived mutant strains. How does gene expression in these mutants compare to worms treated with the FAT-7-decreasing E. coli mutants? While not necessary for this publication, it would be interesting to see whether the FAT-7-decreasing E. coli strains can increase isp-1 and nuo-6 lifespan.

      Response: We will compare the gene expression changes observed in isp-1 and nuo-6 mutants with the gene expression changes observed in worms exposed to FAT-7-suppressing diets. Additionally, we will examine the lifespan of isp-1 mutants on the mutant diets. These data will be included in the revised manuscript.

      SEK-1 is also involved in the p38-mediated innate immune signaling pathway, which has been shown to contribute to longevity in C. elegans. In fact, disruption of sek-1 using RNAi decreased the lifespan of several long-lived mutant strains PMID 36514863.

      Response: We thank the reviewer for highlighting this point. We have now added that the role of SEK-1 in regulating lifespan on FAT-7-suppressing diets could also be because of its role in innate immunity. The revised manuscript reads: “Notably, SEK-1 also regulates innate immunity and is essential for the extended lifespan observed in several long-lived C. elegans mutants (Soo et al., 2023). Therefore, its effect on lifespan in response to FAT-7-suppressing diets may also stem from its role in innate immune regulation.

      Figure 2. Why were cyoA and ycbk chosen to show the full Kaplan-meier survival plot?

      Response: These were selected randomly to show the range of the lifespan phenotype observed.

      Figure 2, panel D. A better title may be "Mean Survival (Percent increase from control)"

      Response: We have made this change.

      While not necessary for this paper, it would be interesting to determine whether the FAT-7-decreasing E. coli strains alter resistance to oxidative stress.

      Response: We will study the survival of worms on these diets upon supplementation with paraquat.

      Figure 4. It may be interesting to include a correlation plot comparing hsp-6::GFP fluorescence and lifespan. It looks like the magnitudes of increase for each phenotype are not correlated.

      Response: We have added a new Figure (Figure S4) to show the correlation between hsp-6::GFP fluorescence levels and percent change in mean lifespan. Indeed, there is no correlation between these phenotypes.

      Reviewer #1 (Significance (Required)):

      Overall, this is an interesting paper and the experiments are rigorously performed. The bacterial screen was comprehensive and was followed up by careful mechanistic experiments. This paper will be of interest to researchers studying the biology of aging. A diagram of the working model of the underlying mechanisms would enhance the paper. Response: We thank the reviewer for highlighting the significance of the study. We will include a model in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Das et al. investigate how different bacterial mutants affect the lifespan of C. elegans. The authors screened a library of E. coli mutants using a fat-7 reporter and identified 26 strains that reduce fat-7 expression, cause developmental delay, induce the mitochondrial unfolded protein response (using hsp-6 reporter), and increase worm lifespan. Among these, they focused on four strains and demonstrated that the effects of these mutants on developmental delay, fat-7 expression, and hsp-6 induction could be suppressed by iron supplementation. Furthermore, they showed that iron depletion alone is sufficient to induce fat-7 expression in worms. The lifespan extension observed in worms fed these mutant bacterial strains depends on SKN-1, SEK-1, and HLH-30. Overall, this is a well-written manuscript that highlights the role of iron in regulating fat-7 expression. However, the findings from the initial screen do not significantly expand upon what is already known in the literature. Many of the identified hits overlap with those reported by Zhang et al. (2019), which also highlighted the role of iron in developmental delay and hsp-6 induction. While the lifespan data and the role of fat-7 are novel aspects of this study, the authors have not conducted detailed mechanistic investigations to address key questions, such as: 1) How does the deletion of these bacterial genes alter the metabolic state of the diet? 2) How do these metabolic changes influence fat-7 expression in worms? 3) How does the downregulation of fat-7 contribute to longevity? Addressing these points would strengthen the mechanistic insights of the study.

      Response: We thank the reviewer for a thoughtful summary of our work and for the valuable feedback provided to improve the manuscript. We would like to emphasize that the screening conditions and objectives of our study were fundamentally different from those of Zhang et al. (2019). Furthermore, Zhang et al. (2019) did not investigate the effects of the bacterial mutants identified in their screens on C. elegans lifespan. Notably, the 26 bacterial mutants identified in our screen do not overlap with those reported in previous studies that examined bacterial strains promoting C. elegans longevity. As detailed below, we will address the points raised by the reviewer that will certainly strengthen the mechanistic insights of the study.

      Here are my detailed comments: 1. Suppressing FAT-7 levels in C. elegans does not inherently increase lifespan. To directly attribute this effect to FAT-7, it would be important to attempt a rescue experiment to restore FAT-7 expression and assess whether the lifespan extension persists. Additionally, measuring oleic acid levels in these mutants would help determine whether a high-oleic-acid diet is suppressing FAT-7 expression. The role of oleic acid cannot be ruled out using fat-2 mutants (Fig. 3B), as fat-2 mutants accumulate oleic acid when fed WT bacteria, but this may not translate to endogenous oleic acid accumulation in conditions where FAT-7 is suppressed.

      Response: We thank the reviewer for these useful suggestions. We will overexpress FAT-7 under a pan-tissue promoter (eft-3) and study lifespan on FAT-7-suppressing diets. Moreover, to explore whether oleic acid has any role in enhancing lifespan on FAT-7-suppressing diets, we will study the lifespan of worms on these diets upon supplementing with oleic acid along with wild-type bacterium control.

      To understand the host-microbe interaction in this study, it is important to determine what specific changes in the bacteria contribute to the observed phenotypes in worms. Identifying these bacterial factors will provide a clearer picture of their role in influencing worms stress signaling and lifespan.

      Response: The phenotypes observed in C. elegans across all the identified bacterial mutants are remarkably consistent, including increased UPRmt activation, reduced FAT-7 levels, delayed development, and extended lifespan. This consistency suggests that a common underlying factor is driving these effects. Although the bacterial mutants appear genetically diverse, gene expression data from C. elegans, along with comparisons to the findings of Zhang et al. (2019), indicate that elevated levels of reactive oxygen species (ROS) may represent this shared factor. These results suggest that bacterial ROS play a central role in mediating the host-microbe interactions underlying the observed phenotypes. To further support this hypothesis, we will directly measure ROS levels in the identified bacterial mutants. Additionally, we will test whether antioxidant treatment can suppress the C. elegans phenotypes, thereby establishing a causal role for bacterial ROS.

      It is important to rule out any changes in food consumption in worms fed these bacterial mutants, as differences in feeding amount could attribute to the observed lifespan effects.

      Response: We will carry out pharyngeal pumping rate measurements to study whether there is any difference in food consumption in worms fed these bacterial mutants.

      In figure 5A to 5G, please include the same-day controls to help clarify how iron supplementation effects these phenotypes relative to the control. For example, in Fig. 5F, it appears that iron extends the lifespan of worms fed the control diet. It would be clearer if appropriate controls were included in all of these figures or summarized in a table to help understand the impact of iron.

      Response: We will include these controls in the revised manuscript.

      How does iron depletion affect the levels of fat-7, and how does this contribute to the activation of the longevity pathways discussed in the manuscript.

      Response: This is an intriguing question. There are at least two possible explanations: (1) oxidative stress may directly downregulate fat-7 expression, and (2) iron depletion could reduce ferroptosis, which in turn may influence fatty acid metabolism. In the revised manuscript, we will include data on how oxidative stress affects FAT-7 expression.

      Minor comments 1. Please include a detailed table of the lifespan data for all replicates as a supplementary table.

      Response: We have included the details of survival curves for all the data in the new Table S2.

      In the Methods section, specify at what stage the worms were exposed to iron and the iron chelator for the lifespan experiments.

      Response: The L1-synchronized worms were exposed to iron and iron chelator plates and allowed to develop till the late L4 stage before being transferred to lifespan assay plates that also contained the respective supplements. This information is now included in the Methods section.

      Please clarify whether equal optical density (O.D.) of cells was seeded for both the WT and mutant strains, and mention if the mutants exhibit any growth defects.

      Response: We have examined the growth of the bacterial mutants and found that they do not exhibit growth defects. Therefore, for all the assays, NGM plates were seeded with saturated cultures of all the bacterial strains. We have now included the growth curves data in the manuscript (Figure S4).

      Reviewer #2 (Significance (Required)):

      Significance General Assessment: This study by Das et al. explores the impact of bacterial mutants on C. elegans lifespan. A key strength of the study is the identification of bacterial mutants that influence the expression of the gene encoding fatty acid desaturase (fat-7) and lifespan in C. elegans. Furthermore, the study highlights the role of iron in regulating fat-7 expression, suggesting that iron imbalance may play a crucial role in modulating fatty acid metabolism. However, the study's main limitation is that it does not significantly extend the current understanding of the microbial modulation of host metabolism and aging, as many of the identified bacterial hits overlap with those previously reported in Zhang et al. (2019). The manuscript would benefit from more in-depth mechanistic exploration, especially with regard to how specific bacterial factors influence the metabolic state of the worms and how these changes ultimately modulate fat-7 expression and longevity.

      Response: We thank the reviewer for highlighting the significance of our study. Once again, we would like to emphasize that the screening conditions and objectives of our study differed fundamentally from those of Zhang et al. (2019). Furthermore, Zhang et al. did not investigate the impact of the bacterial mutants identified in their screen on C. elegans lifespan. As outlined above, we will address the reviewer’s comments, which will undoubtedly strengthen the mechanistic insights of our study.

      Advance: This study presents a conceptual advance by exploring the iron-dependent regulation of fat-7 expression and lifespan in C. elegans, linking bacterial mutations with key longevity pathways (SKN-1, SEK-1, and HLH-30). The novelty lies in the direct investigation of the bacterial-induced changes in fat-7 expression, though the role of iron in these mutants for development and induction of mito-UPR was previously shown in the literature. This study also adds to the growing body of work on C. elegans as a model for studying aging and host-microbe interactions, particularly in understanding how diet and microbial exposure affect metabolic processes and lifespan.

      Response: We thank the reviewer for highlighting the advancement made by our study.

      Audience: This research will primarily interest specialized audiences in aging research, microbiology, and metabolism, especially those focused on host-microbe interactions. Keywords of my expertise: Host-microbe interactions, metabolism, system biology, C. elegans, aging.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors perform a screen by feeding C. elegans different E. coli genetic mutants and examining the effect on the expression of fat-7, a stearoyl-CoA 9-desturase, which has been associated with longevity. They identify 26 E. coli strains that decrease fat-7 expression, all of which slow development and increase lifespan. RNA sequencing of worms treated with 4 of these strains identified genes involved in defense against oxidative stress among those genes that are commonly upregulated. Feeding C. elegans these 4 bacterial strains results in increased ROS and activation of the mitochondrial unfolded protein response, which appears to contribute to lifespan extension as these bacterial strains do not increase lifespan when the mitochondrial unfolded protein response transcription factor ATFS-1 is disrupted. Finally, the authors demonstrate a role for iron levels in mediating these phenotypes: iron supplementation inhibits the phenotypes caused by the identified bacterial strains, while iron chelation mimics these phenotypes.

      Major comments:

      The proposed model involves an increase in ROS levels activating the UPRmt and then leading to lifespan extension. If the elevation is ROS levels is contributing then treatment with antioxidants should prevent UPRmt activation and lifespan extension.

      The authors suggest that iron depletion may disrupt iron-sulfur cluster proteins. The Rieske iron-sulfur protein ISP-1 from mitochondrial electron transport chain complex III has previously been associated with lifespan. Point mutations affecting the function of ISP-1 or RNAi decreasing the levels of ISP-1 both result in increased lifespan (PMID 20346072, 11709184). Thus, iron depletion may be increasing ROS, activating UPRmt and increasing lifespan through decreasing ISP-1 levels.

      All of the Kaplan-meier survival plots are missing statistical analyses. Please add p-values.

      It would be helpful to include a model diagram of the proposed mechanisms in the main figures.

      Minor comments:

      Rather than "mutant diets" it would be more informative to call these "FAT-7-decreasing diets"

      Is it surprising that none of the bacterial strains increased FAT-7 levels? Why do you think this is?

      Page 5. "We hypothesized that diets reducing FAT-7 might elevate oleic acid levels". Since FAT-7 converts stearic acid to oleic acid, wouldn't deceasing FAT-7 levels decrease oleic acid levels and increase stearic acid levels?

      Page 6. The authors cite Bennett et al. 2014 for the statement that "Activation of the UPRmt has been associated with lifespan extension". This paper reaches the opposite conclusion "Activation of the mitochondrial unfolded protein response does not predict longevity in Caenorhabditis elegans". Also, in the Bennett paper and PMID 34585931, it is shown that constitutive activation of ATFS-1 decreases lifespan. Thus, the relationship between the UPRmt and lifespan is not straightforward. These points should be mentioned.

      Page 6. "Our transcriptomic analysis suggested elevated ROS". Rather than refer to gene expression, it would be better to refer to the ROS measurements that were performed.

      The long-lived mitochondrial mutants isp-1 and nuo-6 have increased ROS, UPRmt activation and increased lifespan. Multiple studies have examined gene expression in these long-lived mutant strains. How does gene expression in these mutants compare to worms treated with the FAT-7-decreasing E. coli mutants? While not necessary for this publication, it would be interesting to see whether the FAT-7-decreasing E. coli strains can increase isp-1 and nuo-6 lifespan.

      SEK-1 is also involved in the p38-mediated innate immune signaling pathway, which has been shown to contribute to longevity in C. elegans. In fact, disruption of sek-1 using RNAi decreased the lifespan of several long-lived mutant strains PMID 36514863.

      Figure 2. Why were cyoA and ycbk chosen to show the full Kaplan-meier survival plot?

      Figure 2, panel D. A better title may be "Mean Survival (Percent increase from control)"

      While not necessary for this paper, it would be interesting to determine whether the FAT-7-decreasing E. coli strains alter resistance to oxidative stress.

      Figure 4. It may be interesting to include a correlation plot comparing hsp-6::GFP fluorescence and lifespan. It looks like the magnitudes of increase for each phenotype are not correlated.

      Significance

      Overall, this is an interesting paper and the experiments are rigorously performed. The bacterial screen was comprehensive and was followed up by careful mechanistic experiments. This paper will be of interest to researchers studying the biology of aging. A diagram of the working model of the underlying mechanisms would enhance the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      Summary: Chitin is a critical component of the extracellular matrix of arthropods and plays an essential role in the development and protection of insects. There are two chitin synthases in insects: Type A (exoskeletons) and Type B (for the peritrophic matrix in the gut). The study aims to investigate the specificity and mechanisms of the two chitin synthases in D. melanogaster and to clarify whether they are functionally interchangeable. Various genetic manipulations and fluorescence-based labeling were used to analyze the expression, localization, and function of Kkv and Chs2 in different tissues. Chs2 is expressed in the PR cells of the proventriculus and is required for chitin deposition in the peritrophic matrix. Kkv can deposit chitin in ectodermal tissues but not in the peritrophic matrix, whereas Chs2 can deposit chitin in the peritrophic matrix but not in ectodermal tissues. The subcellular localization of chitin synthases is specific to the tissues in which they are expressed. Kkv localizes apically in ectodermal tissues, whereas Chs2 localizes apically in the PR cells of the proventriculus. Altogether, Kkv and Chs2 cannot replace each other. The specificity of chitin synthases in D. melanogaster relies on distinct cellular and molecular mechanisms, including intracellular transport pathways and the specific molecular machinery for chitin deposition.*

      • *

      Congratulations on this incredible story and manuscript, which is straightforward and well-written. However, I have some comments that may help to improve it.

      We thank the reviewer for this very positive comment. We have addressed all comments to clarify and improve our manuscript.

      Major comments: 1.) Funny thing: the Chs2 mutant larva shows a magenta staining below the chitin accumulation of the esophagus, which looks like a question mark in 1H but cannot be found in control. Is that trachea reaching the pv?

      We assume that the reviewer refers to Fig 1N. As the reviewer suspects, this corresponds to a piece of trachea. Figure 1N shows a single section, making it difficult to identify what this staining corresponds to. We are providing below a projection of several sections where it is easier to identify the staining as tracheal tissue (arrow).

      We are now marking this pattern as trachea (tr) in the manuscript Figure 1N

      2.) Also, though it is evident that the PM chitin is lost in Ch2 mutants, could it be that the region is disturbed and cells express somewhere else chitin? There are papers by Fuß and Hoch (e.g., Mech of Dev, 79, 1998; Josten, Fuß et al., Dev. Biol.267, 2004) using markers such as Dve, Fkh, Wg, Delta, and Notch, etc. for precisely marking the endodermal/ectodermal region in the embryonic foregut/proventriculus. It would be beneficial to show, along with chitin and Chs expression patterns, the ectoderm/endoderm cells. This is particularly important as the authors report endodermal expression of Chs2 in embryos but don't use co-markers of the endodermal cells.

      We agree with the reviewer that this is an important issue and we note that Reviewer 2 also raised the same point. Therefore, we have addressed this issue.

      We obtained an antibody against Dve, kindly provided by Dr. Hideki Nakagoshi. Dve marks the endodermal region in the proventriculus (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998).This antibody worked nicely in our dissected L3 digestive tracts and allowed us to mark the endodermal region. We also obtained an antibody against Fkh, kindly provided by Dr. Pilar Carrera. Fkh marks the ectodermal foregut cells (Fuss and Hoch, 1998, Fuss et al., 2004). While, in our hands, this antibody performed well in embryonic tissues, we observed no staining in our dissected L3 digestive tracts. The reason for this is unclear, but we suspect technical limitations may be responsible (the ectodermal region of the proventriculus is very internal, potentially hindering antibody penetration). To circumvent this inconvenience, we tested a FkhGFP tagged allele available in Bloomington Stock Center. Fortunately, we were able to detect GFP in ectodermal cells of L3 carrying this allele. Using this approach, we conducted experiments to detect Fkh and Dve in the wild type or in Df(Chs2) conditions (Fig S1). In addition, we used these markers to map the expression of Kkv and Chs2 in the proventriculus (Fig 4).

      Altogether the results using these endodermal/ectodermal markers confirmed the presence of a cuticle adjacent to the FkhGFP-positive cells and a PM adjacent to the PR cells, marked by Dve. This PM is absent in Df(Chs2) L3 escapers, however, the general pattern of Fkh/Dve expression is not affected. Finally, we show that Chs2-expressing cells are positive for Dve while Kkv-expressing cells are not. We were unable to conduct an experiment demonstrating Kkv and Fkh co-expression due to technical incompatibilities, as both genes require the use of GFP-tagged alleles to visualise their expression. However, we believe that our imaging of Dve/Kkv clearly shows that Kkv expressing cells lack Dve expression and are localised in the internal (ectodermal) region of the proventriculus (Fig 4E).

      3.) The origin of midgut chitin accumulation is unclear. Chitin can come from yeast paster. Can the authors check kkv and chs2 mutants for food passage and test starving L1 larvae to detect chitin accumulation in the midgut without feeding them?

      This is a very interesting point that has also intrigued us.

      We observed that, in addition to the PM layer lining the midgut epithelium, CBP staining also revealed a distinct luminal pattern. Our initial hypothesis was that this pattern corresponded to the PM. However, its presence in Df(Chs2) larval escapers clearly indicates that this is not the case. Unfortunately, we cannot assess this pattern in kkv mutants, as these die at eclosion and do not proceed to larva stages.

      As the reviewer suggests, a likely possibility is that the luminal pattern originates from components in the food. These could correspond to yeast, as suggested by the reviewer, or possibly remnants of dead larvae present in the media (although Drosophila is considered herbivore in absence of nutritional stress).

      To assess whether the luminal pattern originates from the food we conducted two independent experiments. In experiment 1, we collected larvae reared under normal food conditions. Newly emerged L3 larvae were transferred in small numbers to minimise cannibalism (Ahmad et al., 2015) to new Petri plates containing moist paper. Larvae were starved for 3,4 or 5 days. Larvae starved for more than 5 days did not survive. We then dissected the guts and analysed CBP staining. We observed the presence of luminal CBP staining in these larvae, along with the typical PM signal in the proventriculus and along the midgut. In experiment 2, we collected larvae directly on agar plates containing only agar (without yeast or any other nutrients). We allowed the larvae to develop. These larvae showed minimal growth. We dissected the guts of these small larvae (which were challenging to dissect) and analysed CBP staining. Again, we detected presence of luminal CBP staining.

      These experiments indicate that, despite starvation, a luminal chitin pattern is still detected, suggesting that it is unlikely to originate from food. However, we cannot unequivocally rule out the possibility that the cannibalistic, detrivorous or carnivorous behavior of the nutrionally stressed larvae (Ahmad et al., 2015) in our experiments may influence the results. Therefore, more experiments would be required to address this point.

      In summary, while we cannot provide a definitive answer to the reviewer's question, nor fully satisfy our own curiosity, we would like to note that this specific observation is unrelated to the main focus of our study, as we have confirmed that the luminal pattern is not dependent on Chs2 function.

      Portions of midgut of starved larvae under the regimes indicated, stained for chitin (CBP, magenta). Note the presence of the luminal chitin pattern in the midgut

      4.) Subcellular localization assays require improved analysis, such as a co-marker for the apical membrane and statistical analysis with co-localization tools, showing the overlap at the membrane and intracellularly with membrane co-markers and KDEL.

      We have addressed the point raised by the reviewer. To analyse and quantify Chs2 subcellular localisation, particularly considering the observed pattern, we decided to use both a membrane and an ER marker. As a membrane marker we used srcGFP expressed in tracheal cells (see answer to point 7 of Reviewer 1) and as an ER marker we used KDEL. In this analysis, tracheal cells also expressed Chs2, which was visualised using the Chs2 antibody generated in the lab.

      To assess the colocalisation of Chs2 with each marker we used the JaCop pluggin in Fiji. We analysed individual cells from different embryos stained for membrane/ER/Chs2 using single confocal sections (to avoid artificial colocalisation). Images were processed as described in Materials and Methods. We obtained the Pearson's correlation coefficient (r), which measures the degree of colocalisation, for Chs2/srcGFP and Chs2/KDEL, n=36 cells from 9 different embryos. The average r value for Chs2/srcGFP was 0,064, while the average for Chs2/KDEL was around 0,7. r ranges between -1 and 1, where 1 indicates perfect correlation, 0 no correlation, and -1 perfect anti-correlation. Typically, an r value of 0.7 and above is considered a strong positive correlation, whereas a value below 0,1 is regarded as very weak or no correlation. Thus, our colocalisation analysis supports the hypothesis that Chs2 is primarily retained in the ER when expressed in non-endogenous tissues, likely unable to reach the membrane.

      We have reorganised the figures and now present an example of Chs2/srcGFP/KDEL subcellular localisation in tracheal cells and the colocalisation analysis in Fig 5H. The colocalisation analysis is described in the Materials and Methods section.

      Minor comments:

      5.) The authors used "L3 larval escapers." It would be interesting to know if the lack of Chs2 and the peritrophic matrix cause any physiological defects or lethality.

      The point raised by the reviewer is very interesting and relevant. The peritrophic matrix is proposed to play several important physiological roles, including the spatial organisation of the digestive process, increasing digestive efficiency, protection against toxins and pathogens, and serving as a mechanical barrier. Therefore, it is expected that the absence of chitin in the PM of the Df(Chs2) larval escapers may cause various physiological effects.

      Analysing these effects is a complex task, and it constitutes an entire research project on its own. In addressing the physiological requirements of the PM, we aim to analyse adult flies and assess various parameters, including viability, digestive transit dynamics, gut integrity, resistance to infections, fitness and fertility.

      A critical initial challenge in conducting a comprehensive analysis of the physiological requirements of the PM is identifying a suitable condition to evaluate the absence of Chs2. In this work we are using a combination of two overlapping deficiencies that uncover Chs2, along with a few additional genes (as indicated in Fig S1F). This deficiency condition presents two major inconveniences: first, the observed defects could be caused or influenced by the absence of genes other than Chs2, preventing us from conclusively attributing the defects to Chs2 loss (unless we rescued the defects by adding Chs2 back as we did in the manuscript). Second, the larva escapers, which are rare, do not survive to adulthood (indicating lethality but preventing us from analysing specific physiological aspects).

      To overcome these limitations, we are currently working to identify a genetic condition in which we can specifically analyse the absence of Chs2. We have identified several available RNAi lines and we are testing their efficiency in preventing chitin deposition in the PM. Additionally, we are characterising a putative null Chs2 allele, Chs2CR60212-TG4.0. This stock contains a Trojan-GAL4 gene trap sequence in the third intron, inserted via CRISPR/Cas9. As described in Flybase (https://flybase.org/), the inserted cassette contains a 'Trojan GAL4' gene trap element composed of a splice acceptor site followed by the T2A peptide, the GAL4 coding sequence and an SV40 polyadenylation signal. When inserted in a coding intron in the correct orientation, the cassette should result in truncation of the trapped gene product and expression of GAL4 under the control of the regulatory sequences of the trapped gene. We already know that, when crossed to a reporter line (e.g. UAS-GFP or UAS-nlsCherry) this line reproduces the Chs2 expression pattern, suggesting that the insertion may generate a truncated Chs2 protein. This line would represent an ideal tool to assess the absence of Chs2, and we are currently characterising it for further analysis

      In summary, we fully agree with the reviewer that investigating the physiological requirements of the PM is a compelling area of research, and we are actively addressing this question. However, this investigation constitutes a substantial and independent research effort that we believe is beyond the scope of the current manuscript at this stage.

      6.) The order identifiers are missing for materials and antibodies, e.g., anti-GFP (Abcam), but Abcam provides several ant-GFP; which was used? Please provide order numbers that guarantee the repeatability for others.

      We have now added all identifiers for materials and reagents used, in the materials and methods section.

      7.) Figure S5C, C', what marks GFP (blue) in the trachea? Maybe I have overlooked the description. What is UASsrcGFP? What is the origin of this line?

      We apologise for not providing a more detailed description of the UASsrcGFP line. This line corresponds to RRID BDSC#5432, as now indicated in Materials and Methods section.

      In this transgene, the UAS regulatory sequences drive the expression of GFP fused to Tag:Myr(v-src). As described in Flybase (https://flybase.org/), the P(UAS-srcEGFP) construct contains the 14 aa myristylation domain of v-src fused to EGFP. This tag is commonly used to target proteins of interest to the plasma membrane. The construct was generated by Eric Spana and is available in Drosophila stock centers.

      We typically use this transgene as a plasma membrane marker to outline cell membrane contours. In our experiments, srcGFP, under the control of the btlGal4 promoter, was used to visualise the membrane of tracheal cells in relation to Chs2 accumulation. As indicated in point 4, we have now transferred the images of srcGFP/Chs2/KDEL to the main Figures and used it for colocalisation analyses.

      8.) The authors claim that they validated the anti-Chs2 antibody. However, they show only that it recognizes a Cht2 epitope via ectopic expression. For more profound validation, immune staining is required in deletion mutants, upon knockdown, or upon expression of recombinant proteins, which is not shown.

      We generated an antibody against Chs2. We found that the antibody does not reliably detect the endogenous Chs2 protein, and so we find no pattern in the proventriculus or any other tissue in our immunostainings. It is very possible that the combination of low endogenous levels of Chs2 with a sub-optimal antibody (or low titer) leads to this result. In any case, as the antibody does not detect endogenous Chs2, it cannot be validated by analysing the expression upon Chs2 knockdown. In contrast, our antibody clearly detects specific staining in various tissues (e.g. trachea, salivary glands, gut) when Chs2 is expressed using the Gal4/UAS system, confirming its specificity for Chs2. It is worth to point that it is not unusual to find antibodies that are not sensitive enough to detect endogenous proteins but can detect overexpressed proteins (e.g

      (Lebreton and Casanova, 2016)).

      As an additional way to validate the specificity of our antibody, we have used the chimeras generated, as suggested by the reviewer. As indicated in the Materials and Methods section, the Anti-Chs2 was generated against a region comprising 1222-1383 aa in Chs2, with low homology to Kkv. This region is present in the kkv-Chs2GFP chimera but absent in Chs2-KkvGFP (see Fig 7A). Accordingly, our antibody recognises kkv-Chs2GFP but does not recognise Chs2-KkvGFP (Fig S7).

      We have revised the text in chapter 6 (6. Subcellular localisation of Chs2 in endogenous and ectopic tissues) to clarify these points and we have added the validation of the antibody using the chimeras in chapter 8 (8. Analysis of Chs2-Kkv chimeras) and Fig S7

      9) The legend and text explaining Fig. 4 D-E' can be improved. The authors used the Crimic line, which is integrated into the third ("coding") intron. This orientation can lead to the expression of Gal4 and cause a truncated version of the protein (according to Flybase). Is Chs2 expression reduced in the crimic mutant? If the mutation causes expression of a truncated version, the Chs2 antibody may not be able to detect it as it recognizes a fragment between 1222 and 1383 aa? Also, I'm unsure whether the Chs2 antibody or GFP was used to detect expression in PR cells. The authors describe using Ch2CR60212>SrcGFP together with Chs2+ specific antibodies.

      We apologise for the confusion.

      As the reviewer points, Chs2CR60212-TG4.0 contains a Trojan-GAL4 gene trap sequence in the third intron, inserted via CRISPR/Cas9. As described in Flybase (https://flybase.org/), the inserted cassette contains a 'Trojan GAL4' gene trap element composed of a splice acceptor site followed by the T2A peptide, the GAL4 coding sequence and an SV40 polyadenylation signal. When inserted in a coding intron in the correct orientation, the cassette should result in truncation of the trapped gene product and expression of GAL4 under the control of the regulatory sequences of the trapped gene.

      We found that when crossed to UAS-GFP or UAS-nlsCherry, this line reproduces a expression pattern that must correspond to Chs2. As the antibody that we generated is not suitable for detecting Chs2 endogenous expression, we resorted to using this combination, Chs2CR60212-TG4.0 crossed to a reporter line (such asUAS-GFP or UAS-nlsCherry), to visualise Chs2 expression by staining for GFP/Cherry in the intestinal tract and in the embryo (Figures 4 and S4).

      We realise that the Figure labelling we used in our original submission is very misleading, and we apologise for this. In the original figures we had labelled the staining combination with Kkv, Chs2, Exp as if we had used these antibodies. However, in all cases, we used GFP to visualise the pattern of these proteins in the genetic combinations indicated in the figures. We have corrected this in our revised version. We have also updated the text (Chapter 5), figures and figure legends.

      As the reviewer points, the insertion in Chs2CR60212-TG4.0 is likely to generate a truncated Chs2 protein. We cannot confirm this using the Chs2 antibody we generated because it does not recognise the endogenous Chs2 pattern. Nevertheless, as indicated in point 5, we are currently characterising this line. Our preliminary results indicate a high complexity of effects from this allele that require thorough analysis, as it may be acting as a dominant negative.

      Reviewer #1 (Significance (Required)):

      Significance: The manuscript's strength and most important aspects are the genetic analysis, expression, and localization studies of the two Chitin synthases in Drosophila embryos and larvae. However, beyond this manuscript, the development of mechanistic details, such as interaction partners that trigger secretion and action at the apical membranes and the role of the coiled-coil domain, will be interesting.

      The manuscript uses "first-class" genetics to describe the different roles of the two Chitin synthases in Drosophila, comparing ectodermal chitin (tracheal and epidermal chitin) with endodermal (midgut) chitin. Such a precise analysis has not been investigated before in insects. Therefore, the study deeply extends knowledge about the role of Chitin synthases in insects.

      The audience will specialize in basic research in zoology, developmental biology, and cell biology regarding - how the different Chitin synthases produce chitin. Nevertheless, as chitin is relevant to material research and medical and immunological aspects, the manuscript will be fascinating beyond the specific field and thus for a broader audience.

      I'm working on chitin in the tracheal system and epidermis in Drosophila.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Drosophila have two different chitin synthase enzymes, Kkv and Chs2, and due to unique expression patterns and mutant phenotypes, it is relatively clear that they have different functions in producing either the cuticle-related chitin network (Kkv) or the chitin associated with the peritrophic matrix (PM). However, what is unknown is whether the different functions in making cuticle vs PM chitin is related to differences in cellular expression and/or enzyme properties within the cell. The authors exploit the genetic tractability of Drosophila and their ability to image cuticle vs PM chitin production to examine whether these 2 enzymes can substitute each other. They conclude that these two proteins are not equivalent in their capacity to generate chitin. The data are convincing; however, it is currently presented in a subjective fashion, which makes it difficult to interpret. Additionally, in my opinion there is some interpretation that requires softening or alternatively interpreted.

      We are pleased that the reviewer finds our data convincing. However, we acknowledge the reviewer's concern that our data was presented in a subjective manner, and we apologise for this. In response, we have carefully reviewed the entire manuscript and revised our data presentation to ensure a more objective tone. Numerous changes (including additional quantifications, new experiments and clarifications) have been incorporated throughout the text. These revisions are highlighted in the marked-up version. We hope that this revision provides a more accurate and objective presentation of our work.

      Major Comments:

      1- While the imaging is lovely, there are some things that are difficult to see in the figures. For example, the "continuous, thin and faint 'chitin' layer that lined the gut epithelium" is very difficult to visualise in the control images. Can they increase the contrast to help the reader appreciate this layer? This is particularly important as we are asked to appreciate a loss of this layer in the absence of Chs2.

      We have tried to improve the figures so that the PM layer in the midgut region is more clearly visible. We have added magnifications of small sections at the midgut lumen/epithelium border in grey to help visualise the PM. These improvements have been made in Figures 1,2,S1,S2,S3 and we believe that they better illustrate our results.

      2- All the mutant analysis is presented subjectively. For example, the authors state that they "found a consistent difference of CBP staining when they compared the 'Chs2' escapers to the controls". How consistent is consistent? Can this be quantified? What is the penetrance of this phenotype? They say that the thin layer is absent in the midgut and the guts are thinner. Could they provide more concrete data?

      As indicated above, we have reviewed the text to provide a more objective description of the phenotypes.

      We have quantified the defects in the Df(Chs2) mutant conditions. For this quantification we dissected intestinal tracts of control and Df(Chs2) larva escapers. We fixed, stained and mounted them together. The control guts expressed GFP in the midgut region as a way to distinguish control from mutants. We analysed the presence or absence of chitin in the PM. We found absence of chitin in the proventricular lumen and in the midgut in all Df(Chs2) guts and presence of chitin there in all control ones (n=12 Df(Chs2) guts, n=9 control guts, from 5 independent experiments). The results indicate a fully penetrant phenotype of lack of chitin in Df(Chs2) larva escapers (100% penetrance). We have added this quantification in the text, chapter 2 (2. Chs2 deposits chitin in the PM).

      To quantify the thickness of the guts, we took measurements of the diameter in control and Df(Chs2) guts at two comparable distance positions from the proventriculus (position 1, position 2, see image). Our quantifications indicated thinner tubes in mutant conditions.

      Image shows the anterior part of the intestinal tract, with the proventriculus encircled in white. Positions 1 and 2 indicate where the diameter quantifications were taken. Scatter plots quantifying the diameter at the two different positions in control and Chs2 larval escapers. Bars show mean {plus minus} SD. p=p value of unpaired t test two-tailed with Welch's correction.

      However, we are aware that our analysis of the thickness of the gut is not accurate, because we have not used markers to precisely measure at the same position in all guts and because we have not normalised the measurement position in relation to the whole intestinal tract (mainly due to technical issues).

      In relation to the fragility, we noticed that the guts of Chs2 larval escapers tended to break more easily during dissection than control guts, however, we have not been able to quantify this parameter in a reliable and objective manner.

      Since we consider that the requirement of Chs2 for PM deposition is sufficiently demonstrated, and that aspects such as gut morphology or fragility relate to the physiological requirements of the PM, which we are beginning to address as a new independent project (see our response to point 5 of Reviewer 1), we have decided to remove the sentence 'We also noticed that the guts of L3 escapers were thinner and more fragile at dissection." from the manuscript to avoid subjectivity.

      3- They state that Chs2 was able to restore accumulation of chitin in the PM of the proventriculus and the midgut. Please quantify. Additionally, does this restore the morphology of the guts (related to the comment above on the thinner guts in the absence of Chs2)?

      We have quantified the rescue of chitin deposition in the PM when Chs2 is expressed in PR cells in a Df(Chs2) mutant background. For this quantification we used the following genetic cross: PRGal4/Cyo; Df(Chs2)/TM6dfdYFP (females) crossed to UASChs2GFP or UASChs2/Cyo; Df(Chs2)/TM6dfdYFP. We selected Df(Chs2) larval escapers by the absence of TM6 (recognisable by the body shape). Among these larval escapers, we identified the presence of Chs2 in PR cells by the expression of GFP or Chs2. We found absence of chitin in the proventriculus and in the midgut in all Df(Chs2) guts that did not express Chs2 in PR cells (n=8/8 Df(Chs2)). In contrast, chitin was present in those intestinal tracts where Chs2 expression was detected in PR cells (n=8/8 PRGal4-UASChs2; Df(Chs2) guts, from 5 independent experiments). The results indicate a full rescue of chitin deposition by Chs2 expression in PR cells in Df(Chs2) mutant larvae. We have added this quantification in the text, chapter 2 (2. Chs2 deposits chitin in the PM).

      As requested by the reviewer, we have also conducted measurements to quantify gut thickness. We performed an analysis similar to the one described in point 2, this time comparing the diameter of Df(Chs2) and PRGal4-UASChs2;Df(Chs2) guts at positions 1 and 2 (see image in point 2 of Reviewer 2). Our quantifications indicated that guts were thicker when Chs2 is expressed in the PR region in Df(Chs2) larval escapers.

      As discussed in point 2, we have decided not to include these results in the manuscript, as this type of analysis requires a more comprehensive investigation.

      Scatter plots quantifying the diameter at the two different positions in Chs2 larval escapers and Chs2 larval escapers expressing Chs2 in PR cells. Bars show mean {plus minus} SD. p=p value of unpaired t test two-tailed with Welch's correction.

      4- This may be beyond the scope of this paper, but I find it interesting that the PM chitin is deposited in the proventricular lumen. Yet it forms a thin layer that lines the entire midgut? Any idea how this presumably dense chitin network gets transported throughout the midgut to line the epithelium? I imagine that this is unlikely due to diffusion, especially if they see an even distribution across the midgut. Do they see any evidence of a graded lining (i.e. is it denser in the midgut towards the proventriculus and does this progressively decrease as you look through the midgut?)?

      Insect peritrophic matrices have been classified into Type I and II (with some variations) depending on their origin (extensively reviewed in (Peters, 1992, Hegedus et al., 2019). Type I PMs are typically produced by delamination as concentric lamellae along the length of the midgut. Type II PMs, in contrast, are produced in a specialised region of the midgut that corresponds to the proventriculus and are typically more organised than Type I. In Type II PMs, distinct layers originate from distinct cell clusters in the proventriculus. It has been proposed that as food passes, it becomes encased by the extruded PM, which then slides down to ensheath the midgut. Drosophila larvae have been proposed to secrete a type II PM: through PM implantation experiments, Rizki proposed that the proventriculus is required to generate the PM in Drosophila larvae (Rizki, 1956). Our experiments confirmed this hypothesis: we show that expressing Chs2 exclusively in PR cells is sufficient to produce a PM along the midgut. Furthermore, we also show that expressing Chs2 in the midgut is not sufficient to produce a PM layer lining the midgut, at least at larval stages.

      The type II PM in Drosophila is proposed to be fully organised into four layers in the proventricular region (also referred as PM formation zone) before reaching the midgut (Peters, 1992, King, 1988, Rizki, 1956, Zhu et al., 2024). However, the mechanism by which the PM is subsequently transported into the midgut remains unclear. PM movement posteriorly is thought to depend on to the pressure exerted by continuous secretion of PM material (Peters, 1992). Early work by Wigglesworth (1929, 1930) proposed that the PM is secreted into the proventricular lumen, becomes fully organised, and is then pushed down by a press mechanism involving the aposed ectodermal/endodermal walls of the proventriculus. Rizki suggested that muscular contractions of the proventriculus walls may play a role, and that peristaltic movements of the gut add a pulling force to push the PM into the midgut (Rizki, 1956). Nevertheless, to our knowledge, the exact mechanism is still not fully understood.

      In response to the reviewer's question, the level of resolution of our analysis does not allow us to determine whether there is a graded PM lining along the midgut. However, available data using electron microscopy approaches suggest that the PM is a fully organised structure composed of four layers that is secreted and transported to line the midgut (King, 1988, Zhu et al., 2024).

      5- The authors state that expression of kkv in tracheal cells of kkv mutants perfectly restores accumulation of chitin in the luminal filaments. Is this really 100% restoration? They also reference a paper here, which may have quantified this result.

      We previously reported that the expression of kkv in tracheal cells restores chitin deposition in kkv mutants (Moussian et al,2015). However, our previous study did not quantify this rescue. As requested by the reviewer, we have now quantified the extent of the rescue.

      To perform this quantification, we used the following genetic cross:

      btlGa4/(Cyo); kkv/TM6dfdYFP (females) crossed to +/+; kkv UASkkvGFP/TM6dfdYFP (males)

      We stained the resulting embryos with CBP (to detect chitin) and GFP. GFP staining allowed us to identify the kkv mutants (by the absence of dfdYFP marker) and to simultaneously identify the embryos that expressed kkvGFP in tracheal cells (through btlGal4-driven expression). Since btlGal4 is homozygous viable, most females carried two copies of btlGal4.

      We compared the following embryo populations across 4 independent experiments:

      1. Cyo/+; kkv/kkv UASkkvGFP (kkv mutants not expressing kkv in the trachea)
      2. btlGal4/+; kkv/kkv UASkkvGFP (kkv mutants expressing kkv in the trachea) Results:

      3. Cyo/+; kkv/kkv UASkkvGFP ---- 0/6 embryos deposited chitin in trachea

      4. btlGal4/+; kkv/kkv UASkkvGFP ---- 27/27 embryos deposited chitin in trachea These results indicate complete restauration of chitin deposition in kkv mutants when kkv is expressed in tracheal cells (100% rescue).

      To further investigate whether Chs2 can compensate for kkv function in ectodermal tissues, we performed a similar quantification using the following genetic cross:

      btlGa4/(Cyo); kkv/TM6dfdYFP (females) crossed to UASChs2GFP/UASChs2GFP; kkv UASkkvGFP/TM6dfdYFP (males)

      We compared the following embryo populations across 2 independent experiments:

      1. Cyo/UASChs2GFP; kkv/kkv (kkv mutants not expressing Chs2 in the trachea)
      2. btlGal4/ UASChs2GFP; kkv/kkv (kkv mutants expressing Chs2 in the trachea) Results:

      3. Cyo/UASChs2GFP; kkv/kkv ---- 0/4 embryos deposited chitin in trachea

      4. btlGal4/ UASChs2GFP; kkv/kkv ---- 0/16 embryos deposited chitin in trachea These results indicate no restauration of chitin deposition in kkv mutants expressing Chs2 in the trachea (0% rescue).

      We have now incorporated these quantifications in the text, chapter 4 (4. Chs2 cannot replace Kkv and deposit chitin in ectodermal tissues.)

      6- They ask whether Kkv overexpression in the proventriculus can rescue Chs2 mutants... and vice versa, whether Chs2 overexpression in ectodermal cells can rescue kkv mutants. They show that kkv overexpression leads to an intracellular accumulation of chitin in the proventriculus. However, Chs2 overexpression in the trachea did not lead to any accumulation of chitin in the cells. They tailored their experiments and the associated discussion to address the hypothesis that there is potentially some difference in trafficking of these components. However, another possibility, which they have not ruled out, is that the different ability of kkv and Chs2 to produce chitin inside cells of the proventriculus and ectoderm, respectively, is potentially related to different enzymatic activities and cofactors required for chitin formation in these different cell types. Is this another potential explanation for the differences that they observe?

      We note that Kkv overexpression in any cell type (e.g. ectoderm, endoderm) consistently leads to chitin polymerisation. In ectodermal tissues, Kkv expression, in combination with Exp/Reb activity, results in extracellular chitin deposition. In the absence of Exp/Reb, Kkv expression leads to the accumulation of intracellular chitin punctae (De Giorgio et al., 2023, Moussian et al., 2015); this work). This correlates with the accumulation of Kkv at the apical membrane and presence of Kkv-containing vesicles, regardless of the presence of Exp/Reb (De Giorgio et al., 2023, Moussian et al., 2015); Figure 6, S6). In endodermal tissues, regardless of the presence of Exp/Reb, Kkv cannot deposit chitin extracellularly and instead produces intracellular chitin punctae. This correlates with a diffuse accumulation of Kkv in the endodermal cells (PR cells, or gut cells in the embryo) but presence of Kkv-containing vesicles (Figure 6, S6).

      In previous work we showed that Kkv's ability to polymerise chitin is completely abolished when it is retained in the ER. Indeed, we found that a mutation in a conserved WGTRE region leads to ER retention, the absence of Kkv-containing vesicles in the cell, and absence of intracellular chitin punctae or chitin deposition (De Giorgio et al., 2023).

      These findings indicate a correlation between Kkv subcellular localisation and chitin polymerisation/extrusion. Therefore, we hypothesise that intracellular trafficking and subsequent subcellular localisation play a crucial role in regulating Kkv activity (De Giorgio et al., 2023; this work).

      We find that Chs2 is expressed in PR cells (Figure 4) and observe that only in these PR cells does Chs2 localise apically (Fig 5A-D, S5A,B). This localisation correlates with the ability of Chs2 to deposit chitin in the PM and the presence of intracellular chitin punctae in PR cells (Fig 1F). When Chs2 is expressed in other cells types, we detect it primarily in the ER and observed no Chs2-containing vesicles (vesicles are suggestive of trafficking). This localisation correlates with the inability of Chs2 to produce intracellular chitin punctae or extracellular chitin deposition.

      Again, these results suggest a correlation between Chs2 subcellular localisation and chitin polymerisation/extrusion, aligning with the results observed for Kkv. Therefore, we hypothesise in this work that the intracellular trafficking and subsequent subcellular localisation of Chs2 play a crucial role in regulating its activity.

      Our hypothesis is consistent with seminal work in yeast chitin synthases, which has demonstrated the critical role of intracellular trafficking, and particularly ER exit, in regulating chitin synthase activity (reviewed in (Sanchez and Roncero, 2022).

      That said, we cannot exclude other explanations that are also compatible with the observed results. As pointed out by the reviewer, it is possible that Chs2 and Kkv require different enzymatic activities and/or cofactors for chitin polymerisation/deposition, which may be specific to different cell types. Indeed, we know that the auxiliary proteins Exp/Reb are specifically expressed in certain ectodermal tissues (Moussian et al., 2015). These mechanisms could act jointly or in parallel with the regulation of intracellular trafficking, or could even regulate this intracellular trafficking itself.

      Identifying the exact mechanisms controlling Kkv and Chs2 intracellular trafficking would be necessary to determine whether additional mechanisms (specific cofactors or enzymatic activities) are also involved or even serve as the primary regulatory elements.

      We have introduced these additional possibilities in the discussion section.

      7- They co-express Chs2 and Reb and show that this does not lead to chitin production or secretion. In the discussion they conclude that Chs2 does not "seem to be dependent on 'Reb' activity". I think that this statement potentially needs softening. They show that Reb is not sufficient in to induce Chs2 chitin production in cells that do not normally make a PM. However, they do not show that it is not essential in cells that normally express Chs2 and make PM.

      We fully agree with the reviewer's observation and thank her/him for pointing it out.

      As indicated by the reviewer, we show that co-expression of Reb and Chs2 in different tissues does not lead to an effect distinct from that observed with Chs2 expression alone. In addition, in the discussion we mention that we could not detect expression of reb/exp in PR cells, which aligns with the findings from Zhu et al, 2024, indicating no expression of reb/exp in the midgut cells of the adult proventriculus, as assessed by scRNAseq. We found that exp is expressed in the ectodermal cells of the larval proventriculus (Fig S4D), correlating with kkv expression in this region and cuticle deposition. These findings led us to propose that Chs2 does not seem to be dependent on Exp/Reb activity.

      However, in our original manuscript, we did not directly address whether Exp/Reb are required in the cells that normally express Chs2. As a result, we could not conclude that Chs2 relies on a set of auxiliary proteins different from Exp/Reb, and therefore a different molecular mechanism to that of Kkv in regulating chitin deposition.

      To address this specific point, we have conducted a new experiment to test Exp/Reb requirement in PR cells. We co-expressed RNAi lines for Exp/Reb in these cells and found that chitin deposition in the PM was not prevented. This further supports the hypothesis that Exp/Reb activity is not necessary for Chs2 function. We have added this experiment to Chapter 4 and Fig S3I,J.

      8- They looked at the endogenous expression pattern of kkv and Chs2 and say that they found accumulation of Kkv in the proventriculus and no accumulation in the midgut. Siimilarly, they look at the expression of Chs2 and detect it in cells of the proventriculus. Are there markers of these different cell types that they could use to colocalize these enzymes?

      We agree with the reviewer that this is an important issue and we note that Reviewer 1 also raised the same point. Therefore, we have addressed this issue.

      We obtained an antibody against Dve, kindly provided by Dr. Hideki Nakagoshi. Dve marks the endodermal region in the proventriculus (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998).This antibody worked nicely in our dissected L3 digestive tracts and allowed us to mark the endodermal region. We also obtained an antibody against Fkh, kindly provided by Dr. Pilar Carrera. Fkh marks the ectodermal foregut cells (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998). While, in our hands, this antibody performed well in embryonic tissues, we observed no staining in our dissected L3 digestive tracts. The reason for this is unclear, but we suspect technical limitations may be responsible (the ectodermal region of the proventriculus is very internal, potentially hindering antibody penetration). To circumvent this inconvenience, we tested a FkhGFP tagged allele available in Bloomington Stock Center. Fortunately, we were able to detect GFP in ectodermal cells of L3 carrying this allele. Using this approach, we conducted experiments to detect Fkh and Dve in relation to chitin accumulation in the wild type (Fig S1). In addition, we used these markers to map the expression of Kkv and Chs2 in the proventriculus (Fig 4). Our results using these endodermal/ectodermal markers confirmed the presence of a cuticle adjacent to the FkhGFP-positive cells and a PM adjacent to the PR cells, marked by Dve. Additionally, we show that Chs2-expressing cells are positive for Dve while Kkv-expressing cells are not. We could not conduct an experiment showing Kkv and Fkh co-expression due to technical incompatibilities, as we have to use GFP tagged alleles for both Kkv and Fkh to reveal their expression. However, we believe that our imaging of Dve/Kkv clearly shows that Kkv expressing cells lack Dve expression and localise in the internal (ectodermal) region of the proventriculus (Fig 4E).

      9- They overexpress Chs2 in cells of the midgut and see that it colocalises with an ER marker. They conclude that it is retained in the ER, which again, for them suggests that it has a trafficking problem in these cells. However, they are overexpressing it in these cells and this strong accumulation that they observe in the ER could simply be due to the massive expression levels. Additionally, they cannot conclude that it doesn't get out of the ER at all. They could be correct in thinking that there may be a trafficking issue, but this experiment does not conclusively show that Chs2 is entirely retained in the ER when expressed in ectopic tissues. I wonder if their interpretation needs softening or whether they should potentially address alternative hypotheses.

      The reviewer raises two distinct issues: 1) the localisation of overexpressed proteins 2) Chs2 ER retention.

      We agree that massive overexpression can lead to artifactual subcellular localisation due to saturation of the secretory pathway, causing ER accumulation. In our experiments, we overexpressed Kkv and Chs2 in different tissues (trachea, salivary glands, embryonic gut, and larval proventriculus), inducing high levels of both chitin synthases.

      For Kkv, we observed distinct subcellular localisation patterns in ectodermal versus endodermal tissues (illustrated in new Fig S6). In ectodermal tissues such as the trachea, large amounts of KkvGFP were detected, most of it localising apically. We also detected a more general KkvGFP distribution throughout the cell, including the ER, particularly at early stages. Additionally, we observed many KkvGFP-positive vesicles, reflecting exocytic and endocytic trafficking, as described previously (De Giorgio et al., 2023). The presence of these vesicles (as well as the apical localisation) indicates that KkvGFP is able to exit the ER. Indeed, our previous work demonstrated that when Kkv is retained in the ER, it does not localise apically or appear in vesicles (De Giorgio et al, 2023). In endodermal tissues, as described in our manuscript, KkvGFP did not exhibit polarised apical localisation and instead showed a diffuse pattern with some cortical enrichment. However, the presence of KkvGFP-containing vesicles still suggests that the protein is capable of exiting the ER also in these endodermal tissues.

      We observed a different subcellular pattern when we overexpressed Chs2GFP. In tissues where Chs2 is not normally expressed (e.g., trachea, salivary gland, embryonic gut), we did not detect apical or membrane accumulation (see Fig. 5,S5, S6 and response to point 4 of Reviewer #1). Nor did we observe accumulation of Chs2GFP in intracellular vesicles. Instead, Chs2GFP showed strong colocalisation with an ER marker (see Fig. 5,S5, S6 and response to point 4 of Reviewer #1). In contrast, when overexpressed in PR cells, we detected apical enrichment (Fig 5A-D, S5A,B). This indicates that despite massive expression levels, Chs2 can exit the ER in particular tissues.

      Taken together, our results strongly suggest that overexpressed Kkv can exit the ER in the different tissues analysed, whereas most Chs2GFP is retained in the ER in tissues other than PR cells. This correlates with the ability of overexpressed KkvGFP to polymerise chitin (either in intracellular puncta or deposited extracellularly depending on the presence of Exp/Reb) in all analysed tissues. Conversely, Chs2 was unable to polymerise chitin (either in intracellular puncta or extracellularly regardless of Exp/Reb presence) in tissues other than PR cells.

      Nevertheless, we acknowledge that we cannot definitively conclude that all Chs2 protein is entirely retained in the ER. We have included this caveat in our revised manuscript (Chapter 6 and Discussion section).

      Minor Comments: - No mention of Fig 3I in the results section and the order discussed in the results does not match the order in the figure.

      We apologise for these inconsistencies. We have addressed this issue in the text, figure legend, and the image order in Figure 3 and Figure S3.

      • In the results please provide some information on what the CRIMIC collection is and how it allows you to see Chs2 expression for non-experts.

      We have addressed this point in chapter 5 in the revised version, and we now provide a more detailed explanation of the CRIMIC Chs2CR60212-TG4.0 allele.

      Further details of this allele are also provided in our responses to points 5 and 9 of Reviewer 1.

      Reviewer #2 (Significance (Required)):

      Drosophila produce different types of chitinous structures that are required for either the exoskeleton of the animal or for proper gut function (peritrophic matrix). Additionally, most insects have two enzymes involved in the production of chitin and current data suggests that they have unique roles in producing either the exoskeleton or the peritrophic matrix. However, it is unclear whether their different functions are due to differences in cell type expression or differences in physiological activity of the enzymes. The authors exploit Drosophila to drive these 2 enzymes in different cell types that are known to produce the exoskeleton or the peritrophic matrix to determine whether they can functionally substitute mutant backgrounds. Their results give us a hint that these enzymes are not equivalent. What the authors were unable to address is why they are not equivalent. They hypothesise that the different physiological functions of the enzymes may be related to trafficking differences within their respective cell types. While this is an interesting hypothesis, the date are not really clear yet to make this conclusion.

      This work will be of interest to anyone interested in chitinous structures in insects and the cell biology of chitin-related enzymes.

      Literature


      AHMAD, M., CHAUDHARY, S. U., AFZAL, A. J. & TARIQ, M. 2015. Starvation-Induced Dietary Behaviour in Drosophila melanogaster Larvae and Adults. Sci Rep, 5__,__ 14285.

      DE GIORGIO, E., GIANNIOS, P., ESPINAS, M. L. & LLIMARGAS, M. 2023. A dynamic interplay between chitin synthase and the proteins Expansion/Rebuf reveals that chitin polymerisation and translocation are uncoupled in Drosophila. PLoS Biol, 21__,__ e3001978.

      FUSS, B. & HOCH, M. 1998. Drosophila endoderm development requires a novel homeobox gene which is a target of Wingless and Dpp signalling. Mech Dev, 79__,__ 83-97.

      FUSS, B., JOSTEN, F., FEIX, M. & HOCH, M. 2004. Cell movements controlled by the Notch signalling cascade during foregut development in Drosophila. Development, 131__,__ 1587-95.

      HEGEDUS, D. D., TOPRAK, U. & ERLANDSON, M. 2019. Peritrophic matrix formation. J Insect Physiol, 117__,__ 103898.

      KING, D. G. 1988. Cellular organization and peritrophic membrane formation in the cardia (proventriculus) of Drosophila melanogaster. J Morphol, 196__,__ 253-82.

      LEBRETON, G. & CASANOVA, J. 2016. Ligand-binding and constitutive FGF receptors in single Drosophila tracheal cells: Implications for the role of FGF in collective migration. Dev Dyn, 245__,__ 372-8.

      MOUSSIAN, B., LETIZIA, A., MARTINEZ-CORRALES, G., ROTSTEIN, B., CASALI, A. & LLIMARGAS, M. 2015. Deciphering the genetic programme triggering timely and spatially-regulated chitin deposition. PLoS Genet, 11__,__ e1004939.

      NAKAGOSHI, H., HOSHI, M., NABESHIMA, Y. & MATSUZAKI, F. 1998. A novel homeobox gene mediates the Dpp signal to establish functional specificity within target cells. Genes Dev, 12__,__ 2724-34.

      PETERS, W. 1992. Peritrophic Membranes, Springer Berlin, Heidelberg.

      RIZKI, M. T. M. 1956. The secretory activity of the proventriculus of Drosophila melanogaster. Journal of Experimental Zoology, 131__,__ 203-221.

      SANCHEZ, N. & RONCERO, C. 2022. Chitin Synthesis in Yeast: A Matter of Trafficking. Int J Mol Sci, 23.

      ZHU, H., LUDINGTON, W. B. & SPRADLING, A. C. 2024. Cellular and molecular organization of the Drosophila foregut. Proc Natl Acad Sci U S A, 121__,__ e2318760121.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Drosophila have two different chitin synthase enzymes, Kkv and Chs2, and due to unique expression patterns and mutant phenotypes, it is relatively clear that they have different functions in producing either the cuticle-related chitin network (Kkv) or the chitin associated with the peritrophic matrix (PM). However, what is unknown is whether the different functions in making cuticle vs PM chitin is related to differences in cellular expression and/or enzyme properties within the cell. The authors exploit the genetic tractability of Drosophila and their ability to image cuticle vs PM chitin production to examine whether these 2 enzymes can substitute each other. They conclude that these two proteins are not equivalent in their capacity to generate chitin. The data are convincing; however, it is currently presented in a subjective fashion, which makes it difficult to interpret. Additionally, in my opinion there is some interpretation that requires softening or alternatively interpreted.

      Major Comments:

      • While the imaging is lovely, there are some things that are difficult to see in the figures. For example, the "continuous, thin and faint 'chitin' layer that lined the gut epithelium" is very difficult to visualise in the control images. Can they increase the contrast to help the reader appreciate this layer? This is particularly important as we are asked to appreciate a loss of this layer in the absence of Chs2.
      • All the mutant analysis is presented subjectively. For example, the authors state that they "found a consistent difference of CBP staining when they compared the 'Chs2' escapers to the controls". How consistent is consistent? Can this be quantified? What is the penetrance of this phenotype? They say that the thin layer is absent in the midgut and the guts are thinner. Could they provide more concrete data?
      • They state that Chs2 was able to restore accumulation of chitin in the PM of the proventriculus and the midgut. Please quantify. Additionally, does this restore the morphology of the guts (related to the comment above on the thinner guts in the absence of Chs2)?
      • This may be beyond the scope of this paper, but I find it interesting that the PM chitin is deposited in the proventricular lumen. Yet it forms a thin layer that lines the entire midgut? Any idea how this presumably dense chitin network gets transported throughout the midgut to line the epithelium? I imagine that this is unlikely due to diffusion, especially if they see an even distribution across the midgut. Do they see any evidence of a graded lining (i.e. is it denser in the midgut towards the proventriculus and does this progressively decrease as you look through the midgut?)?
      • The authors state that expression of kkv in tracheal cells of kkv mutants perfectly restores accumulation of chitin in the luminal filaments. Is this really 100% restoration? They also reference a paper here, which may have quantified this result.
      • They ask whether Kkv overexpression in the proventriculus can rescue Chs2 mutants... and vice versa, whether Chs2 overexpression in ectodermal cells can rescue kkv mutants. They show that kkv overexpression leads to an intracellular accumulation of chitin in the proventriculus. However, Chs2 overexpression in the trachea did not lead to any accumulation of chitin in the cells. They tailored their experiments and the associated discussion to address the hypothesis that there is potentially some difference in trafficking of these components. However, another possibility, which they have not ruled out, is that the different ability of kkv and Chs2 to produce chitin inside cells of the proventriculus and ectoderm, respectively, is potentially related to different enzymatic activities and cofactors required for chitin formation in these different cell types. Is this another potential explanation for the differences that they observe?
      • They co-express Chs2 and Reb and show that this does not lead to chitin production or secretion. In the discussion they conclude that Chs2 does not "seem to be dependent on 'Reb' activity". I think that this statement potentially needs softening. They show that Reb is not sufficient in to induce Chs2 chitin production in cells that do not normally make a PM. However, they do not show that it is not essential in cells that normally express Chs2 and make PM.
      • They looked at the endogenous expression pattern of kkv and Chs2 and say that they found accumulation of Kkv in the proventriculus and no accumulation in the midgut. Siimilarly, they look at the expression of Chs2 and detect it in cells of the proventriculus. Are there markers of these different cell types that they could use to colocalize these enzymes?
      • They overexpress Chs2 in cells of the midgut and see that it colocalises with an ER marker. They conclude that it is retained in the ER, which again, for them suggests that it has a trafficking problem in these cells. However, they are overexpressing it in these cells and this strong accumulation that they observe in the ER could simply be due to the massive expression levels. Additionally, they cannot conclude that it doesn't get out of the ER at all. They could be correct in thinking that there may be a trafficking issue, but this experiment does not conclusively show that Chs2 is entirely retained in the ER when expressed in ectopic tissues. I wonder if their interpretation needs softening or whether they should potentially address alternative hypotheses.

      Minor Comments:

      • No mention of Fig 3I in the results section and the order discussed in the results does not match the order in the figure.
      • In the results please provide some information on what the CRIMIC collection is and how it allows you to see Chs2 expression for non-experts.

      Significance

      Drosophila produce different types of chitinous structures that are required for either the exoskeleton of the animal or for proper gut function (peritrophic matrix). Additionally, most insects have two enzymes involved in the production of chitin and current data suggests that they have unique roles in producing either the exoskeleton or the peritrophic matrix. However, it is unclear whether their different functions are due to differences in cell type expression or differences in physiological activity of the enzymes. The authors exploit Drosophila to drive these 2 enzymes in different cell types that are known to produce the exoskeleton or the peritrophic matrix to determine whether they can functionally substitute mutant backgrounds. Their results give us a hint that these enzymes are not equivalent. What the authors were unable to address is why they are not equivalent. They hypothesise that the different physiological functions of the enzymes may be related to trafficking differences within their respective cell types. While this is an interesting hypothesis, the date are not really clear yet to make this conclusion.

      This work will be of interest to anyone interested in chitinous structures in insects and the cell biology of chitin-related enzymes.

    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes a novel magnetic steering technique to target human adipose derived mesenchymal stem cells (hAMSC) or induce pluripotent stem cells to the TM (iPSC-TM). The authors show that delivery of the stem cells lowered IOP, increased outflow facility, and increased TM cellularity.

      Strengths:

      The technique is novel and shows promise as a novel therapeutic to lower IOP in glaucoma. hAMSC are able to lower IOP below the baseline as well as increase outflow facility above baseline with no tumorigenicity. These data will have a positive impact on the field and will guide further research using hAMSC in glaucoma models.

      Weaknesses:

      The transgenic mouse model of glaucoma the authors used did not show ocular hypertensive phenotypes at 6-7 months of age as previously reported. Therefore, if there is no pathology in these animals the authors did not show a restoration of function, but rather a decrease in pressure below normal IOP.

      We appreciate the reviewer’s feedback and agree with the statement of weakness. Accordingly, we have revised the language to improve clarity. Specifically, all references to "restoration of IOP" or "restoration of conventional outflow function" have been replaced with more precise phrases, in the following locations: 

      • lines 2-3 (title): Magnetically steered cell therapy for reduction of intraocular pressure  as a treatment strategy for open-angle glaucoma

      • lines 36-8 (abstract): We observed a 4.5 [3.1, 6.0] mmHg or 27% reduction in intraocular pressure (IOP) for nine months after a single dose of only 1500 magnetically-steered hAMSCs, explained by increased conventional outflow facility and associated with higher TM cellularity.

      • lines 45-6 (one-sentence summary): A novel magnetic cell therapy provided effective intraocular pressure reduction in mice, motivating future translational studies.

      • lines 123-4 (introduction): Despite the absence of ocular hypertension in our MYOC<sup>Y437H</sup> mice, our data demonstrate sustained IOP lowering and a significant benefit of magnetic cell steering in the eye, particularly for hAMSCs, strongly indicating further translational potential.

      • line 207 (results): The observed reductions in IOP and increases in outflow facility after delivery of both cell types suggested functional changes in the conventional outflow pathway.

      • line 509-10 (discussion): In summary, this work shows the effectiveness of our novel magnetic TM cell therapy approach for long-term IOP reduction through functional changes in the conventional outflow pathway.

      It is very important to note that at the 23rd annual Trabecular Meshwork Study Club meeting (San Diego, December 2024), Dr. Zode, the lead author of reference 26 originally describing the transgenic myocilin mouse model, announced during his talk that this model no longer demonstrates the glaucomatous phenotype in his hands, which incidentally has motivated him to create a new, CRISPR MYOC mouse model. Dr. Zode also stated that he was uncertain of the reason for this loss of phenotype. His observation is consistent with our report. However, other investigators continue to observe the desired phenotype in their colonies of this mouse (Dr. Wei Zhu, personal communication). Continued use of this mouse model should therefore be approached with caution. 

      Reviewer #2 (Public review):

      Summary:

      This observational study investigates the efficacy of intracameral injected human stem cells as a means to re-functionalize the trabecular meshwork for the restoration of intraocular pressure homeostasis. Using a murine model of glaucoma, human adiposederived mesenchymal stem cells are shown to be biologically safer and functionally superior at eliciting a sustained reduction in intraocular pressure (IOP). The authors conclude that the use of human adipose-derived mesenchymal stem cells has the potential for long-term treatment of ocular hypertension in glaucoma.

      Strengths:

      A noted strength is the use of a magnetic steering technique to direct injected stem cells to the iridocorneal angle. An additional strength is the comparison of efficacy between two distinct sources of stem cells: human adipose-derived mesenchymal vs. induced pluripotent cell derivatives. Utilizing both in vivo and ex vivo methodology coupled with histological evidence of introduced stem cell localization provides a consistent and compelling argument for a sustainable impact exogenous stem cells may have on the refunctionalization of a pathologically compromised TM.

      Weaknesses:

      A noted weakness of the study, as pointed out by the authors, includes the unanticipated failure of the genetic model to develop glaucoma-related pathology (elevated IOP, TM cell changes). While this is most unfortunate, it does temper the conclusion that exogenous human adipose derived mesenchymal stem cells may restore TM cell function. Given that TM cell function was not altered in their genetic model, it is difficult to say with any certainty that the introduced stem cells would be capable of restoring pathologically altered TM function. A restoration effect remains to be seen. 

      We acknowledge that the phrase “restoration of TM function” is not fully supported by our results, given the absence of ocular hypertension in our animal model. Accordingly, we have revised the language to more precisely describe our findings. For specific details regarding these changes, please refer to our response to Reviewer 1’s public comments above.

      Another noted complication to these findings is the observation that sham intracameralinjected saline control animals all showed elevated IOP and reduced outflow facility, compared to WT or Tg untreated animals, which allowed for more robust statistically significant outcomes. Additional comments/concerns that the authors may wish to address are elaborated in the Private Review section.

      We agree that sham-injected animals tended to have higher average IOPs than transgenic animals in our study. However, these differences did not reach statistical significance and therefore remain inconclusive. Further, an increase in IOP following placebo injection has been previously reported (Zhu et al., 2016). 

      Prompted by the Referee’s comments and also a private comment from Referee 1, we further investigated this effect by analyzing IOP in uninjected contralateral eyes at the mid-term time point and comparing the IOPs in these eyes to other cohorts, as now presented as additional data in Supplementary Tables 1 and 2 and Supplementary Figure 4 (see below). In brief, the uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Additionally, we cannot rule out potential contralateral effects induced by the injections.

      Regarding the best way to assess the effect of cell treatment, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control (vehicle)-injected eyes, since this provides the most direct accounting for the effects of injection itself on IOP. Other comparisons, such as WT or untreated Tg eyes vs. cell-treated eyes, are interesting but harder to interpret. However, in response to the referee’s comment, we have added comparisons between cell-treated groups and untreated Tg eyes to Table 2, adjusting the post-hoc corrections accordingly. All hAMSC treated groups show statistically significant decrease in IOP even compared to Tg untreated eyes, while iPSC-TMs fail to reach such significance.

      The following changes were made to the manuscript:

      Lines 326 et seq.: Eyes subjected to saline injection exhibited marginally higher IOPs and lower outflow facilities on average, in comparison to the transgenic animals at baseline. However, due to the lack of statistical significance in these differences and the inherent age difference between the saline-injected animals and the non-injected controls at baseline, no conclusive inference can be drawn regarding the effect of saline injection. To investigate this phenomenon further, we also analyzed IOPs in uninjected contralateral eyes at the midterm time point (Supplementary Tables 1 and 2, Supplementary Figure 4). The uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham-injected group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Of note, contralateral hypertension has been previously reported after subconjunctival and periocular injection of dexamethasoneloaded nanoparticles (34), and we similarly cannot definitively rule out potential contralateral effects induced by our stem cell injections. Thus, we cannot draw any definite conclusions from these additional IOP comparisons at this time.

      Reviewer #3 (Public review):

      Summary:

      The purpose of the current manuscript was to investigate a magnetic cell steering technique for efficiency and tissue-specific targeting, using two types of stem cells, in a mouse model of glaucoma. As the authors point out, trabecular meshwork (TM) cell therapy is an active area of research for treating elevated intraocular pressure as observed in glaucoma. Thus, further studies determining the ideal cell choice for TM cell therapy is warranted. The experimental protocol of the manuscript involved the injection of either human adipose derived mesenchymal stem cells (hAMSCs) or induced pluripotent cell derivatives (iPSC-TM cells) into a previously reported mouse glaucoma model, the transgenic MYOCY437H mice and wild-type littermates followed by the magnetic cell steering. Numerous outcome measures were assessed and quantified including IOP, outflow facility, TM cellularity, retention of stem cells, and the inner wall BM of Schlemm's canal.

      Strengths:

      All of these analyses were carefully carried out and appropriate statistical methods were employed. The study has clearly shown that the hAMSCs are the cells of choice over the iPSC-TM cells, the latter of which caused tumors in the anterior chamber. The hAMSCs were shown to be retained in the anterior segment over time and this resulted in increased cellular density in the TM region and a reduction in IOP and outflow facility. These are all interesting findings and there is substantial data to support it.

      Weaknesses:

      However, where the study falls short is in the MYOCY437H mouse model of glaucoma that was employed. The authors clearly state that a major limitation of the study is that this model, in their hands, did not exhibit glaucomatous features as previously reported, such as a significant increase in IOP, which was part of the overall purpose of the study. The authors state that it is possible that "the transgene was silenced in the original breeders". The authors did not show PCR, western blot, or immuno of angle tissue of the tg to determine transgenic expression (increased expression of MYOC was shown in the angle tissue of the transgenics in the original paper by Zode et al, 2011). This should be investigated given that these mice were rederived. Thus, it is clearly possible that these are not transgenic mice.

      All MYOC mice that were used in this study were genotyped and confirmed to carry the transgene as noted in the original version of the paper (see lines 590-2). However, the transgene seems not to have been active, based on the lack of ocular hypertension as well as the lack of differences in supporting endpoints such as outflow facility and TM cellularity. While it would have been possible to carry out their recommended assays to investigate the root cause of this loss of phenotype this was not an objective of our study. Thus we instead here focus simply on communicating the observed loss of phenotype to readers. We also refer the referee to the final paragraph of our response to Referee 1. 

      If indeed they are transgenics, the authors may want to consider the fact that in the Zode paper, the most significant IOP elevation in the mutant mice was observed at night and thus this could be examined by the authors. 

      This is a good point. However, while the dark-phase IOP does exhibit a distinctly larger elevation (as previously observed in hypertonic saline sclerosis), Zode et al. also reported a notable 3 mmHg IOP increase during the light phase. The complete absence of such daytime (light phase) IOP elevation in our animals diminished our enthusiasm for pursuing darkphase IOP measurements. 

      Other glaucomatous features of these mice could also have been investigated such as loss of RGCs, to further determine their transgenic phenotype. 

      We agree that these other phenotypes could be studied, but in the absence of any detectable IOP elevation (and thus lack of mechanical insult on RGC axons), loss of RGC is extremely unlikely. We also note that the loss of retinal ganglion cells (RGCs) in the Myocilin model remains a subject of controversy. For example, despite a significant increase in IOP (>10 mmHg) in this model across four mouse strains, three, including C57BL6/J, did not exhibit any signs of optic nerve damage (McDowell et al., 2012). In contrast, Zhu et al. observed considerable nerve damage in this model, which was reversed following iPSC-TM cell transplantation (Zhu et al., 2016). Given these conflicting findings, we directed our efforts toward outcome measures directly related to aqueous humor dynamics.

      Finally, while increased cellular density in the TM region was observed, proliferative markers could be employed to determine if the transplanted cells are proliferating.

      We agree that identifying the source of the increased trabecular meshwork (TM) cellularity we observed is interesting and we plan to pursue that in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The sham-injected transgenic animals showed elevated IOP 3-4 weeks after the baseline measurements in the transgenic mice. The authors justify this may be due to the increase in age in these animals. However, this seems unlikely due to the short duration of time between measurement of the baseline IOP and the Short time point (3-4 weeks). The authors do not provide IOP data for any WT sham injected eyes or naïve Tg eyes at these time points. These data are essential to determine if the elevation is due to the sham injection, age, or the transgene. Could it be that the IOP in this cohort of Tg mice didn't increase until 7-8 months of age instead of 6-7 months of age? The methods state only unilateral injections of the stem cells were done so it is assumed the contralateral eye was uninjected. What was the IOP in these eyes? These data would clarify the confusion in the data from sham-injected animals compared to baseline (naive) measurements.

      We agree that the average IOP in saline-injected groups is higher than in WT or non-treated Tg mice, although the difference is inconclusive due to a lack of statistical significance. It is important to note, however, that this difference is subtle and not comparable to the 3 mmHg light-phase IOP elevation previously observed in this model (Zode et al., 2011). 

      We appreciate the reviewer’s suggestion to include IOP data from the contralateral uninjected eyes, and we have now provided this information along with the comparative statistics in the supplementary materials. Additional details can be found in our response to a similar comment from Reviewer 2’s public review. In summary, the IOP difference in contralateral non-injected ten-month-old transgenic eyes was even smaller than in the original Tg group. IOP elevation following saline injection in mice has been reported previously (Zhu et al., 2016). As a potential confounding factor, we highlight possible contralateral effects of the injection itself (which is why we initially did not analyze IOP in the contralateral eyes).

      The hAMSC-treated eyes appear to lower IOP even from baseline (although stats were only provided compared to the sham-injected eyes, which as stated above appear to have increased).

      However, the iPSC-TM-treated eyes had IOPs equal to that of the baseline measurements taken 3 weeks prior. The significance is coming from the "sham-treated" eyes which had elevated IOPs. The controls listed above should be included to make these conclusions.

      The reviewer makes an astute observation. Please refer to our response to a similar observation by Reviewer 2 under public reviews, where we provide and discuss the comparative statistics noted by the reviewer. However, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control-injected eyes. 

      If the transgenic mouse model truly did not have a phenotype, then the authors are testing the ability of the stem cells to lower IOP from baseline normal pressures. Therefore, the authors are not "restoring function of the conventional outflow pathway" as there is no damage to begin with. The language in the manuscript should be corrected to reflect this if the transgenics have no phenotype.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to your public review.

      The authors noted in the iPSC-TM-treated eyes there was a high rate of tumorigenicity. If the magnetic steering of these cells is specific and targeted to the TM, why do the tumors form near the central iris?

      While magnetic steering is more specific to the trabecular meshwork (TM) than previouslyused approaches (Bahrani Fard et al., 2023), it is not perfect, and a modest amount of offtarget delivery to the iris, including its central portion, still occurs. Apparently, it took only a few mis-directed iPSC-TM cells to lead to tumors in this work, which is a serious concern for future translational approaches. 

      Reviewer #2 (Recommendations for the authors):

      (1) It appears that mice were injected unilaterally (Line 590). I may have missed this, but was the companion un-injected eye analyzed in this study? If not analyzed, was there a confounding concern or limitation that necessitated omitting this possible control option?

      Contralateral effects, such as hypertension in the untreated eye after subconjunctival and periocular injection of dexamethasone-loaded nanoparticles, have previously been reported in the literature (Li et al., 2019) and also reported anecdotally by other leaders in the field to the senior authors, which is why we did not initially analyze contralateral eyes in this study. However, prompted by this comment and others, we have now included the IOP measurements for contralateral uninjected ten-month-old transgenic eyes in the supplementary materials. For further details, please refer to our response to your public review.

      (2) Were all these mice the same gender? Would gender be expected to alter the findings of this study?

      Animals of both sexes were randomly chosen and included in the study. We added the following statement to the Materials and Methods section (line 530): After breeding and genotyping, mice, regardless of sex, were maintained to age 6-7 months, when transgenic animals were expected to have developed a POAG phenotype.

      (3) As noted in the public review, the use of PBS for a control seems to have resulted in a slight elevation in IOP (Figure 2) as well as a reduction in outflow facility (Figure 3B) when compared to WT or Tg mice. Was this difference statistically significant? 

      The differences between the sham (saline)-injected groups at any time point and untreated Tg mice did not reach statistical significance for IOP, facility, or TM cellularity and for facility, did not even show clear trends. For example, WT mice had, on average, 0.2 mmHg higher IOP and 0.6 nl/min/mmHg greater facility than the Tg group. Meanwhile on a similar scale, the long-term sham group exhibited 0.4 nl/min/mmHg higher facility compared to the Tg group. As the statistical tests indicate, these differences should be interpreted more as noise than meaningful signal. 

      If so, then it should be noted as to whether the observed decrease in IOP following stem cell injection remained statistically significant when compared to these un-injected control animals. If significance was lost, then this should be appropriately noted and discussed. It is not apparently obvious why sham controls should have elevated IOP. This is a design and statistical concern.

      Please refer to our response to a similar observation by Reviewer 1. We believe that comparing the treatment (cell suspension in saline) with its age-matched vehicle (saline) is the appropriate approach which maintains rigor by most directly accounting for the effects of injection. 

      (4) The tonicity of the PBS used as a vehicle control was not stated and I did not see within the methods whether the stem cells were suspended using this same PBS vehicle. I assume isotonic phosphate buffered saline was used and that the stem cells were resuspended using the same sterile PBS. 

      Thanks for catching this. We added “sterile PBS (1X, Thermo Fisher Scientific, Waltham, MA)” to the Methods section of the manuscript (line 567). 

      With regards to using PBS as an injection control, I wonder if a better comparable control might have been to use mesenchymal stem cells that were rendered incapable of proliferating prior to intracameral injection. This, of course, addresses the unexplained mechanism(s) by which mesenchymal stem cells elicit a decrease in IOP.

      This is an interesting idea, and represents another level of control. However, we explicitly chose not to use non-proliferating hAMSCs as a control, for several reasons. Firstly, a saline injection is the simplest control and in this initial study with multiple groups, we did not feel another experimental group should be added. Second, this control would not rule out paracrine effects from injected cells, which our data suggested are an important effect. Third, rendering injected cells truly non-proliferative could introduce unwanted/unknown phenotypes in these cells that would need to be carefully characterized. That being said, if an efficient method could be developed to render an entire population of these cells irreversibly non-proliferating, the reviewer’s suggestion would be worth pursuing to better understand the mechanism of TM cell therapies. 

      (5) As noted in Figure 4C, TM cellular density as quantified was not altered in the sham control, so a loss of cellular density can not explain the elevated IOP with this group. Injecting viable (not determined?) mesenchymal stem cells did show, over the short term, a noted increase in TM cellular density. 

      Thank you for noting this. We agree that changes in cell density do not explain the mild IOP elevation in the sham group. As the referee certainly is aware, there are multiple reasons that IOP can be elevated (changes in trabecular meshwork extracellular matrix, changes in trabecular meshwork stiffness) that are not necessarily related to cell density.  Since we do not know definitively the cause of this mild elevation, we would prefer to not speculate about it in the manuscript. 

      Thanks for pointing out our omission of a statement about injected cell viability. We have now included the following statement in the Materials and Methods section (564-566): “For all the experiments where animals received hAMSC, cell count and >90% viability was verified using a Countess II Automated Cell Counter (Thermo Fisher Scientific, Waltham, MA).”

      I'm confused, as clearly stated (Lines 431-432), mesenchymal stem cells accumulated close to, but not within, the TM. How is it that TM cellular density increased if these stem cells did not enter the TM? The authors may wish to clarify this distinction. Given that mesenchymal stem cells did not increase the risk of tumorigenicity, do the authors have any evidence that these cells actually proliferated post-injection or did they undergo senesce thereby displaying senescence-associated secretory phenotype as a source of paracrine support?

      As the reviewer correctly noted, our observations show that hAMSCs primarily accumulated close to, but outside, the TM (likely caught up in the pectinate ligaments). Based on observations of increased TM cellularity, we think that the most likely explanation of these findings is paracrine signaling, as the reviewer suggests and which was discussed at length in the original version of the manuscript (lines 453-477). 

      We agree that, despite observing little signal from hAMSCs within the TM, labeling with proliferation markers (e.g., Ki-67) and searching for co-localization with exogenous cells, and/or labeling for senescence markers would have provided more mechanistic information. This is an excellent topic for future study, which we plan to pursue, but was outside the scope of this study. 

      (6) As noted in the public review, I think it is a bit of a stretch to even suggest that the findings of this study support stem cell restoration of TM function given that the model apparently did not produce TM cell dysfunction as anticipated. A restoration effect remains to be seen.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to Reviewer 1’s public comment.

      Reviewer #3 (Recommendations for the authors):

      (1) Show PCR, western blot, or immuno of angle tissue of the MYOC tg to confirm transgenic expression.

      (2) Examine the IOP of mice at night.

      (3) Investigate other glaucomatous features in the mice to determine if they have any of the transgenic phenotypes previously reported.

      (4) Examine proliferative markers in the TM region of angles injected with stem cells.

      Please see our responses to all four of these comments in the public section.

      Bibliography (for this response letter only)

      Bahrani Fard, M.R., Chan, J., Sanchez Rodriguez, G., Yonk, M., Kuturu, S.R., Read, A.T., Emelianov, S.Y., Kuehn, M.H., Ethier, C.R., 2023. Improved magnetic delivery of cells to the trabecular meshwork in mice. Exp. Eye Res. 234, 109602. https://doi.org/10.1016/j.exer.2023.109602

      Li, G., Lee, C., Agrahari, V., Wang, K., Navarro, I., Sherwood, J.M., Crews, K., Farsiu, S., Gonzalez, P., Lin, C.-W., Mitra, A.K., Ethier, C.R., Stamer, W.D., 2019. In vivo measurement of trabecular meshwork stiffness in a corticosteroid-induced ocular hypertensive mouse model. Proc. Natl. Acad. Sci. U. S. A. 116, 1714–1722.

      https://doi.org/10.1073/pnas.1814889116

      Zhu, W., Gramlich, O.W., Laboissonniere, L., Jain, A., Sheffield, V.C., Trimarchi, J.M., Tucker, B.A., Kuehn, M.H., 2016. Transplantation of iPSC-derived TM cells rescues glaucoma phenotypes in vivo. Proc. Natl. Acad. Sci. 113, E3492–E3500.

      Zode, G.S., Kuehn, M.H., Nishimura, D.Y., Searby, C.C., Mohan, K., Grozdanic, S.D., Bugge, K., Anderson, M.G., Clark, A.F., Stone, E.M., Sheffield, V.C., 2011. Reduction of ER stress via a chemical chaperone prevents disease phenotypes in a mouse model of primary open angle glaucoma. J. Clin. Invest. 121, 3542–3553. https://doi.org/10.1172/JCI58183

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      · This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      · The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      · Strong methodology with regards to neuroimaging analysis, and physiological measures.

      ·The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      We thank reviewer 1 for the positive feedback and for pointing out the strengths of our work. We agree that future research should investigate varying times between acquisition and counterconditioning to assess its success in real-life applications.

      Major Weaknesses

      (1) The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesized). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

      General exclusion criteria are described on page 17. We have added more detailed information on the reasons for exclusion (see page 17). All exclusions were in line with pre-registered criteria. For the analysis, the reviewer is referring to (PDR analysis that investigated whether CC can prevent the spontaneous recovery of differential conditioned threat responses), 18 participants were excluded from this analysis: 2 participants did not show evidence for successful threat acquisition as was already indicated on page 17, and 16 participants were excluded due to (partially) missing data. We now explicitly mention the exclusion of the additional 16 participants on page 7 and have updated Figure 3 to improve visibility of the individual data points. Therefore, for this analysis both experimental groups consisted of 15 participants (total N=30).

      It is true that in both groups a few participants show the opposite pattern. Although this may also be due to measurement error, we agree that it is relevant to further investigate this in future studies with larger sample sizes. It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      Reviewer #2:

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the various aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control. The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      We thank reviewer 2 for the feedback and for valuing the thoughtfulness that went into designing the study.

      Weaknesses:

      (1) Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

      In this updated version of the manuscript, we included the notion that extinction has been interpreted as a form of implicit emotion regulation. In addition to our discussion on active coping (avoidance), we believe that our discussion has an important link to the more general framework of emotion regulation, while remaining within the scope of relevance. Please see pages 14 and 15 for the changes. In addition to being informative to theories of emotion regulation, our findings are also highly relevant for forms of psychotherapy that build on principles of counterconditioning (e.g. the use of positive reinforcement in cognitive behavioral therapy), as we point out in the introduction. We believe this relevance shows that counterconditioning is more than a niche topic. In line with the recommendation from reviewer 2, we added more details and explanations to the statistical procedures and analyses where needed (see responses to recommendations).

      Reviewer #3:

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      · Mostly clearly written with interesting psychological insights

      · Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).

      · Very interesting results regarding the neural mechanisms of each process.

      · Good acknowledgement of the limitations of the study.

      We thank reviewer 3 for the detailed feedback and suggestions.

      Weaknesses:

      (1) I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).

      Since our design is quite complex with a lot of results, we left the fear acquisition results as a successful manipulation check in the Supplementary Information to not overload the reader with information that is not the main focus of this manuscript. If the editor would like us to add the figure to the main text, we are happy to do so. During fear acquisition, both experimental groups showed comparable differential conditioned threat responses as measured by PDRs and SCRs. Subjective valence ratings indeed differed depending on CS category. Importantly, however, the groups only differed with respect to their rating to the CS- category, but not the CS+ category, which suggests that the strength of the acquired fear is similar between the groups. To make sure that these baseline differences cannot account for the differences in valence after CC/Ext, we ran an additional group comparison with differential valence ratings after fear acquisition added as a covariate. Results show that despite the baseline difference, the group difference in valence after CC/Ext is still significant (main effect Group: F<sub>(1,43)</sub>=7.364, p=0.010, η<sup>2</sup>=0.146). We have added this analysis to the manuscript (see page 7).

      (2) I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.

      We understand that the complexity of the design may require a clearer description. We therefore made some changes throughout the manuscript to improve understanding. Figure 1 is very helpful in understanding the design and we therefore refer to that figure more regularly (see pages 6-7). We also added the time between tasks where appropriate (e.g. see page 7). Re-extinction after reinstatement was indeed mentioned once in the manuscript. Given that the reinstatement procedure was not successful (see page 9), we could not investigate re-extinction and it is therefore indeed not relevant to explicitly mention and may cause confusion. We therefore removed it (see page 12).

      (3) I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.

      Indeed, reward and reward anticipation also evoke an increase in pupil dilation. This was an important reason for including a separate valence-specific response characterization task. Independently from the conditioning task, this task revealed that both threat and reward-anticipation induced strong arousal-related PDRs and SCRs. This was also reflected in the explicit arousal ratings, which were stronger for both the shock-reinforced (negative valence) and reward-reinforced (positive valence) stimuli. Therefore, it is not surprising that reward anticipation leads to stronger PDRs for CS+ (which predict reward) compared to CS- stimuli (which do not predict reward) during CC, but is reduced during extinction due to a decrease in shock anticipation. During the spontaneous recovery test, a return of stronger PDRs for CS+ compared to CS- stimuli in the standard extinction group can only reflect a return of shock anticipation. Importantly, the CC group received no rewards during the spontaneous recovery task and was aware of this, so it is to be expected that the effect is weakened in the CC group. However, CS+ and CS- items were still rated of similar valence and PDRs did not differ between CS+ and CS- items in the CC group, whereas the Ext group rated the CS+ significantly more negative and threat responses to the CS+ did return. It therefore is reasonable to conclude that associating the CS+ with reward helps to prevent a return of threat responses. We have added some clarifications and conclusions to this section on page 8.

      (4) I am not sure that the memories tested were truly episodic

      In line with previous publications from Dunsmoor et al.[1-4], our task allows for the investigation of memory for elements of a specific episode. In the example of our task, retrieval of a picture probes retrieval of the specific episode, in which the picture was presented. In contrast, fear retrieval relies on the retrieval of the category-threat association, which does not rely on retrieval of these specific episodic elements, but could be semantic in nature, as retrieval takes place at a conceptual level. We have added a small note on what we mean with episodic in this context on page 4. We do agree that we cannot investigate other aspects of episodic memories here, such as context, as this was not manipulated in this experiment.

      (5) Twice as many female participants than males

      It is indeed unfortunate that there is no equal distribution between female and male participants. Investigating sex differences was not the goal of this study, but we do hope that future studies with the appropriate sample sizes are able to investigate this specifically. We have added this to the limitations of this study on page 17.

      (6) No explanation as to why shocks were varied in intensity and how (pseudo-randomly?)

      The shock determination procedure is explained on pages 18-19 (Peripheral stimulation). As is common in fear conditioning studies in humans (see references), an ascending staircase procedure was used. The goal of this procedure is to try and equalize the subjective experience of the electrical shocks to be “maximally uncomfortable but not painful”.

      Recommendations for the authors:

      Reviewer #1:

      Very well written. No additional comments

      We thank reviewer 1 for valuing our original manuscript version. To further improve the manuscript, we adapted the current version based on the reviewer’s public review (see response to reviewer #1 public review comment 1).

      Reviewer #2:

      (1) I feel that more justification/explanation is needed on why other regions highly relevant to different aspects of counterconditioning (e.g., threat, memory, reward processing) were not included in the analyses.

      We first performed whole-brain analyses to get a general idea of the different neural mechanisms of CC compared to Ext. Clusters revealing significant group differences were then further investigated by means of preregistered ROI analyses. We included regions that have previously been shown to be most relevant for affective processing/threat responding (amygdala), memory (hippocampus), reward processing (NAcc) and regular extinction (vmPFC). We restricted our analyses to these most relevant ROIs as preregistered to prevent inflated or false-positive findings[5]. Beyond these preregistered ROIs, we applied appropriate whole-brain FEW corrections. The activated regions are listed in Supplementary Table 1 and include additional regions that were expected, such as the ACC and insula.

      (2) Were there observed differences across participants in the experiment? Any information on variance in the data such as how individual differences might influence these findings would provide a richer understanding of counterconditioning and increase the depth of interpretation for a broad readership.

      We agree that investigating individual differences is crucial to gain a better understanding of treatment efficacy in the framework of personalized medicine. Specifically, future research should aim to identify factors that help predict which treatment will be most effective for a particular patient. The results of this study provide a good basis for this, as we could show that the vmPFC in contrast to regular extinction, is not required in CC to improve the retention of safety memory. Therefore, this provides a viable option for patients who are not responding to treatments that rely on the vmPFC. In addition, as noted by Reviewer 1, in both groups a few participants show the opposite pattern (see Figure 3). It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      (3) While most figures are informative and clear, Figure 3 would benefit from detailed axis labels and a more descriptive caption. Currently, it is challenging to navigate the results presented to support the findings related to differential PDRs. A supplementary figure consolidating key patterns across conditions might also further facilitate understanding of this rather complicated result.

      We have made some changes to the figure to improve readability and understanding. Specifically, we changed the figure caption to “Change from last 2 trials CC/Ext to first 2 trials Spontaneous recovery test”, to give more details on what exactly is shown here. We also simplified the x-axis labels to “counterconditioning”, “recovery test” and “extinction”. With the addition of a clearer figure description, we hope to have improved understanding and do not think that another supplemental figure is needed.

      (4) Additional details on the statistical tests are needed. For example, please clarify whether p-values reported were corrected across all experimental conditions. Also, it would be helpful for the authors to discuss why for example repeated measures ANOVA or mixed-effects conditions were not used in this study. Might those tests not capture variance across participants' PDRs and SCRs over time better?

      We added that significant interactions were followed by Bonferroni-adjusted post-hoc tests where applicable (see page 21). We have used repeated measures ANOVAs to capture early versus late phases of acquisition and CC/extinction, as well as to compare late CC/extinction (last 2 trials) compared to early spontaneous recovery (first 2 trials) as is often done in the literature. A trial-level factor in a small sample would cost too many degrees of freedom and is not expected to provide more information. We have added this information and our reasoning to the methods section on page 21.

      Reviewer #3:

      (1) Suggest putting acquisition data into the main figures. In fact many of the supplemental figures could be integrated into the main figures in my opinion.

      See response to reviewer #3 public review comment 1.

      (2) Include explanations for why shock intensity was varied

      See response to reviewer #3 public review comment 6.

      (3) Include a better explanation for the change in differential responding from training to spontaneous recovery in the CC group (I think the loss of such responding in extinction makes more sense and is supported by the notion of spontaneous recovery, but I'm not sure about the loss in the CC group. There is some evidence from the rodent literature - which I am most familiar with - regarding a loss in contextual gradient across time which could account for some loss in specificity, could it be something like this?).

      See response to reviewer #3 public review comment 3.

      If we understand the reviewer correctly in that the we see a loss of differential responding due to a generalization to the CS-, this would imply an increase in responding to the CS-, which is not what we see. Our data should therefore be correctly interpreted as a loss of the specific response to the CS+ from the CC phase to the recovery test. Therefore, there is no spontaneous recovery in the CC group, and also not a non-specific recovery. To clarify this we relabeled Figure 3 by indicating “recovery test” instead of “spontaneous recovery”.

      (4) Is there a possibility that baseline differences, particularly that in Supplemental Figure 2C, could account for later differences? If differences persist after some transformation (e.g. percentage of baseline responding) this would be convincing to suggest that it doesn't.

      See response to reviewer #3 public review comment 1.

      (5) As I mentioned, I got confused by the chronology as I read through. Maybe mention early on when reporting the spontaneous recovery results that testing occurred the next day and that participants were undergoing re-extinction when talking about it for the second time.

      See response to reviewer #3 public review comment 2.

      (6) Page 8 - I was confused as to why it is surprising that the CC group were more aroused than the extinction group, the latter have not had CSs paired with anything with any valence, so doesn't this make sense? Or perhaps I am misunderstanding the results - here in text the authors refer back to Figure 2B, but I'm not sure if this is showing data from the spontaneous recovery test or from CC/extinction. If it is the latter, as the caption suggests, why are the authors referring to it here?

      Participants in the CC group showed increased differential self-reported arousal after CC, whereas arousal ratings did not differ between CS+ and CS- items after extinction. We interpret this in line with the valence and PDR results as an indication of reward-induced arousal. At the start of the next day, however, participants from the CC and extinction groups gave comparable ratings. It may therefore be surprising why participants in the CC group do not still show stronger ratings since nothing happened between these two ratings besides a night’s sleep (see design overview in Figure 1A). We removed the “suprisingly” to prevent any confusion.

      (7) I suggest that the authors comment on whether there were any gender differences in their results.

      See response to reviewer #3 public review comment 5.

      (8) The study makes several claims about episodic memory, but how can the authors be sure that the memories they are tapping into are episodic? Episodic has a very specific meaning - a biographical, contextually-based memory, whereas the information being encoded here could be semantic. Perhaps a bit of clarification around this issue could be helpful.

      See response to reviewer #3 public review comment 4.

      References

      (1) Dunsmoor, J. E. & Kroes, M. C. W. Episodic memory and Pavlovian conditioning: ships passing in the night. Curr Opin Behav Sci 26, 32-39 (2019). https://doi.org/10.1016/j.cobeha.2018.09.019

      (2) Dunsmoor, J. E. et al. Event segmentation protects emotional memories from competing experiences encoded close in time. Nature Human Behaviour 2, 291-299 (2018). https://doi.org/10.1038/s41562-018-0317-4

      (3) Dunsmoor, J. E., Murty, V. P., Clewett, D., Phelps, E. A. & Davachi, L. Tag and capture: how salient experiences target and rescue nearby events in memory. Trends Cogn Sci 26, 782-795 (2022). https://doi.org/10.1016/j.tics.2022.06.009

      (4) Dunsmoor, J. E., Murty, V. P., Davachi, L. & Phelps, E. A. Emotional learning selectively and retroactively strengthens memories for related events. Nature 520, 345-348 (2015). https://doi.org/10.1038/nature14106

      (5) Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G. & Cristea, I. A. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. Eur J Neurosci 53, 357-361 (2021). https://doi.org/10.1111/ejn.14954

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1:

      First, I thank the authors for clarifying some of the confusion I had in the previous comment and I appreciate the efforts the authors put into improving the quality of the manuscript. However, my concerns about the lack of novelty of the key findings are not perfectly addressed and there is no additional analysis done in this revision. Currently in this version of the manuscript, asserting that a p-value of 10-6 is close to genome-wide significance may be considered an overstatement. Further analysis focusing on finding novel and additional discovery is very necessary.

      We thank the reviewer for their comments. Reviewer #2 also made a comment regarding the genomewide threshold, “However, it remains unclear why the authors found it appropriate to apply STEAM to the LAAA model, a joint test for both allele and ancestry effects, which does not benefit from the same reduction in testing burden.” The reviewers’ have correctly identified our oversight - we have amended the manuscript as follows:

      (1) The abstract, “We identified a suggestive association peak (rs3117230, p-value = 5.292 x10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry.”

      (2) From line 233 to 239: “The R package STEAM (Significance Threshold Estimation for Admixture Mapping) (Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g = 15). For the LA model, a genome-wide significance threshold of pvalue < 2.5 x 10-6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5 x 10-8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018).” 

      (3) We excluded the results for the signal on chromosome 20, since this also did not reach the LAAA model genome-wide significance threshold.  

      (4) From line 296 to 308: “LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor ( = 1.05289) (Supplementary Figure 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 3. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.”

      We acknowledge that our results are not statistically significant. However, our study advances this area of research by identifying suggestive African-specific ancestry associations with TB in the HLA-II region. These findings build upon the work of the ITHGC, which did not identify any significant associations or suggestive peaks in their African-specific analyses. We have included this argument in our manuscript (from lines 425 to 432):

      “The ITHGC did not identify any significant associations or suggestive peaks in their African ancestryspecific analyses.  Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixtureinduced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart, van Eeden, et al., 2022).”

      We appreciate the comment regarding additional analyses. We acknowledge that we did not validate our SNP peak in the HLA-II region through fine-mapping due to the lack of a suitable reference panel (see lines 490 to 500). Our long-term goal is to develop a HLA-imputation reference panel incorporating KhoeSan ancestry; however, this is beyond the scope and funding allowances of this study.

      Reviewer #2 (Recommendations for the authors):

      The authors we think have done an excellent job with their responses and the manuscript has been substantially improved.

      Thank you for taking the time to help us improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the public reviewers and editors for their insightful comments on the manuscript. We have made the following changes to address their concerns and think the resulting manuscript is stronger as a result. Specifically, we have 1) added RNA FISH data of specific STB-2 and STB-3 RNA markers to confirm their distribution changes between STB<sup>in</sup> and STB<sup>out</sup> TOs, 2) removed language throughout the text that refer to STB-3 as a terminally differentiated nuclear subtype, and 3) generated CRISPR-mediated knock-outs of two genes identified by network analysis and validated their rolse in mediating STB nuclear subtype gene expression.

      Reviewer #1 (Public review): 

      Strengths: 

      The study offers a comprehensive SC- and SN-based characterization of trophoblast organoid models, providing a thorough validation of these models against human placental tissues. By comparing the older STB<sup>in</sup> and newer STB<sup>out</sup> models, the authors effectively demonstrate the improvements in the latter, particularly in the differentiation and gene expression profiles of STBs. This work serves as a critical resource for researchers, offering a clear delineation of the similarities and differences between TO-derived and primary STBs. The use of multiple advanced techniques, such as high-resolution sequencing and trajectory analysis, further enhances the study's contribution to the field. 

      Thank you for your thoughtful review—we appreciate your recognition of our efforts to comprehensively validate trophoblast organoid models and highlight key advancements in STB differentiation and gene expression.

      Weaknesses: 

      While the study is robust, some areas could benefit from further clarification. 

      (1) The importance of the TO model's orientation and its impact on outcomes could be emphasized more in the introduction. 

      We agree that TO orientation may significantly influence STB nuclear subtype differentiation. As the STB is critical for both barrier formation and molecular transport in vivo, lack of exposure to the surrounding media in STB<sup>in</sup> TOs in vitro could compromise these functions and the associated environmental cues that influence STB nuclear differentiation. We have added text to the introduction to highlight this point (lines 117-120).

      (2) The differences in cluster numbers/names between primary tissue and TO data need a clearer explanation, and consistent annotation could aid in comparison. 

      Thank you for highlighting that the comparisions and cluster annotations need clarification. In Figure 1, we did not aim to directly compare CTB and STB nuclear subtypes between TOs and tissue. Each dataset was analyzed independently, with clusters determined separately and with different resolutions decided via a clustering algorithm (Zappia and Oshlack, 2018). For example, for the STB, this approach identified seven subtypes in tissue but only two in TOs, making direct comparison challenging. To address this challenge, we integrated the SN datasets from TOs and tissue in Figure 6. This integration allowed us to directly compare gene expression between the sample types and examine the proportions within each STB subtype. Similarly, in Figure 2, direct comparison of individual CTB or STB clusters across the separate datasets is challenging (Figures 2A-C) due to differences in clustering. To overcome this, we integrated the datasets to compare cluster gene expression and relative proportions (Figures 2D-E). Nonetheless, to address the reviewers concern we have added text to the results section to clarify that subclusters of CTB and STB between datasets should not be directly compared until the datasets are integrated in Figure 2D-E and Figure 6 (lines 166-167).

      (3) The rationale for using SN sequencing over SC sequencing for TO evaluations should be clarified, especially regarding the potential underrepresentation of certain trophoblast subsets. 

      This is an important point as the challenges of studying a giant syncytial cell are often underappreciated by researchers that study mononucleated cells. We have added text to the introduction to clarify why traditional single cell RNA sequencing techniques were inadequate to collect  and characterize the STB (lines 91-93).

      (4) Additionally, more evidence could be provided to support the claims about STB differentiation in the STB<sup>out</sup> model and to determine whether its differentiation trajectory is unique or simply more advanced than in STB<sup>in</sup>. 

      Our original conclusion that STB<sup>out</sup> nuclei are more terminally differentiated than STB<sup>in</sup> was based on two observations: (1) STB<sup>out</sup> TOs exhibit increased expression of STB-specific pregnancy hormones and many classic STB marker genes and (2) STB<sup>out</sup> nuclei show an enrichment of the STB-3 nuclear subtype, which appears at the end of the slingshot pseudotime trajectory. However, upon consideration of the reviewer comments, we agree that this evidence is not sufficient to definitively distinguish if STB<sup>out</sup> nuclei are more advanced or follow a unique differentiation trajectory dependent on new environmental cues. Pseudotime analyses provided only a predictive framework for lineage tracing, and these predictions must be experimentally validated. Real-time tracking of STB nuclear subtypes in TOs would require a suite of genetic tools beyond the scope of this study. Therefore, to address the reviewers' concerns we have removed language suggesting that STB-3 is a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei throughout the text until the discussion. Therein we present both our original hypothesis (that STB nuclei are further differentiated in STB<sup>out</sup>) and alternative explanations like changing trajectories due to local environmental cues (lines 619-625).

      Reviewer #2 (Public review): 

      Strengths: 

      (1) The use of SN and SC RNA sequencing provides a detailed analysis of STB formation and differentiation. 

      (2) The identification of distinct STB subtypes and novel gene markers such as RYBP offers new insights into STB development. 

      Thank you for highlighting these strengths—we appreciate your recognition of our use of SN and SC RNA sequencing to analyze STB differentiation and the discovery of distinct STB subtypes and novel gene markers like RYBP.

      Weaknesses: 

      (1) Inconsistencies in data presentation. 

      We address the individual comments of reviewer 2 later in this response.

      (2) Questionable interpretation of lncRNA signals: The use of long non-coding RNA (lncRNA) signals as cell type-specific markers may represent sequencing noise rather than true markers. 

      We appreciate the reviewer’s attention to detail in noticing the lncRNA signature seen in many STB nuclear subtypes. However, we disagree that these molecules simply represent sequencing noise. In fact, may studies have rigorously demonstrated that lncRNAs have both cell and tissue specific gene expression (e.g., Zhao et al 2022, Isakova et al 2021, Zheng et al 2020). Further, they have been shown to be useful markers of unique cell types during development (e.g., Morales-Vicente et al 2022, Zhou et al 2019, Kim et al 2015) and can enhance clustering interpretability in breast cancer (Malagoli et al 2024). Many lncRNAs have also been demonstrated to play a functional role in the human placenta, including H19, MEG3, and MEG8 (Adu-Gyamfi et al 2023) and differences are even seen in nuclear subtypes in trophoblast stem cells (Khan et al 2021). Therefore, we prefer to keep these lncRNA signatures included and let future researchers test their functional role.

      To improve the study's validity and significance, it is crucial to address the inconsistencies and to provide additional evidence for the claims. Supplementing with immunofluorescence staining for validating the distribution of STB_in, STB_out, and EVT_enrich in the organoid models is recommended to strengthen the results and conclusions. 

      Each general trophoblast cell type (CTB, STB, EVT) has been visualized by immunofluorescence by the Coyne laboratory in their initial papers characterizing the STB<sup>in</sup>, STB<sup>out</sup>, and EVT<sup>enrich</sup> models (Yang et al, 2022 and 2023). We agree that it is important to validate the STB nuclear subtypes found in our genomic study. However, one challenge in studying a syncytia is that immunofluorescence may not be a definitive method when the nuclei share a common cytoplasm. This is because protein products from mRNAs transcribed in one nucleus are translated in the cytoplasm and could diffuse beyond sites of transcription. Therefore, RNA fluorescence in situ hybridization (RNA-FISH) is instead needed. While a systematic characterization of the spatial distribution of the many marker genes found each subtype is outside the scope of this study, we include RNA-FISH of one STB-2 marker (PAPPA2) and one STB-3 marker (ADAMTS6) in Figure 3F-G and Supplemental Figure 3.3. This demonstrates there is an increase in STB-2 marker gene expression in STB<sup>in</sup> TOs and an increase in STB-3 marker gene expression in STB<sup>out</sup> TOs. 

      Reviewer #3 (Public review):  

      The authors present outstanding progress toward their aim of identifying, "the underlying control of the syncytiotrophoblast". They identify the chromatin remodeler, RYBP, as well as other regulatory networks that they propose are critical to syncytiotrophoblast development. This study is limited in fully addressing the aim, however, as functional evidence for the contributions of the factors/pathways to syncytiotrophoblast cell development is needed. Future experimentation testing the hypotheses generated by this work will define the essentiality of the identified factors to syncytiotrophoblast development and function. 

      We thank the reviewer for their thoughtful assessment, constructive feedback, and encouraging comments. We acknowledge that the initial manuscript primarily presented analyses suggesting correlations between RYBP and other factors identified in the gene network analysis and STB function. Understanding how gene networks in the STB are formed and regulated is a long-term goal that will require many experiments with collaborative efforts across multiple research groups.

      Nonetheless, to address this concern we have knocked out two key genes, RYBP and AFF1, in TOs using CRISPR-Cas9-mediated gene targeting. Bulk RNA sequencing of STB<sup>in</sup> TOs from both wild-type (WT) and knockout strains revealed that deletion of either gene caused a statistically significant decrease in the expression of the pregnancy hormone human placental lactogen and an increase in the expression of several genes characteristic of the oxygen-sensing STB-2 subtype, including FLT-1, PAPPA2, SPON2, and SFXN3. These findings demonstrate that knocking out RYBP or AFF1 results in an increase in STB-2 marker gene expression and therefore play a role in inhibiting their expression in WT TOs (Figure 5D-E and supplemental Figure 5.2). We also note that this is the first application of CRISPR-mediated gene silencing in a TO model.

      Future work will visualize the distribution of STB nuclear subtypes in these mutants and explore the mechanistic role of RYBP and AFF1 in STB nuclear subtype formation and maintenance. However, these investigations fall outside the scope of the current study.

      Localization and validation of the identified factors within tissue and at the protein level will also provide further contextual evidence to address the hypotheses generated. 

      We agree that visualizing STB nuclear subtype distribution is essential for testing the many hypotheses generated by our analysis. To address this, we have included RNA-FISH experiments for two STB subtype markers (PAPPA2 for STB-2 and ADAMTS6 for STB-3) in TOs. These experiments reveal an increase in PAPPA2 expression in STB<sup>in</sup> TOs and an increase in ADAMTS6 expression in STB<sup>out</sup> TOs (Figure 3F-G and Supplemental Figure 3.3). Genomic studies serve as powerful hypothesis generators, and we look forward to future work—both our own and that of other researchers—to validate the markers and hypotheses presented from our analysis.

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      We strongly encourage the authors to further strengthen the study by addressing all reviewers' comments and recommendations, with particular attention to the following key aspects:

      (1) Clarifying the uniqueness of the STB differentiation trajectory between STB<sup>in</sup> and STB<sup>out</sup>, and determining whether STB<sup>out</sup> represents a more advanced stage of differentiation compared to STB<sup>in</sup>. It is also important to specify which developmental stage of placental villi the STB<sup>out</sup> and STB<sup>in</sup> are simulating. 

      We have revised the manuscript to remove definitive language claiming that STB-3 represents a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei. Instead, we now present our hypothesis and alternative explanations in the discussion (lines 619-625), and emphasize the need for experimental validation of pseudotime predictions to test these hypotheses.

      (2) Utilizing immunofluorescence to validate the distribution of cell types in the organoid models. 

      The Coyne lab has previously performed immunofluorescence of CTB and STB markers in STB<sup>in</sup> and STB<sup>out</sup> TOs (Yang et al 2023). The syncytial nature of STBs complicates immunofluorescence-based validation of the STB nuclear subtypes due translating proteins all sharing a single common cytoplasm and therefore being able to diffuse and mix. Instead, we performed RNA-FISH for two STB subtype markers (PAPPA2, STB-2 and ADAMTS6, STB-3), which showed subtype-specific nuclear enrichment in STB<sup>in</sup> and STB<sup>out</sup> TOs, respectively (Figure 3F-G and Supplemental Figure 3.3).  

      (3) Addressing concerns regarding the use of lncRNA as cell marker genes. Employing canonical markers alongside critical TFs involved in differentiation pathways to perform a more robust cell-type analysis and validation is recommended.  

      As discussed in detail above, we maintain that lncRNAs are valuable markers, supported by their demonstrated roles in cell and tissue specificity and placental function. These signatures provide important insights and hypotheses for future research, and we have clarified this rationale in the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors have presented an extensive SC- and SN-based characterization of their improved trophoblast TO model, including a comparison to human placental tissues and the previous TO iteration. In this way, the authors' work represents an invaluable resource for investigators by providing thorough validation of the TO model and a clear description of the similarities and differences between primary and TO-derived STBs. I would suggest that the authors reshape the study to further highlight and emphasize this aspect of the study. 

      We thank the reviewer for their thoughtful recommendation and agree that our datasets will serve as an invaluable resource for comparing in vitro models to in vivo gene expression. However, extensive validation is required to make definitive conclusions about the extent to which these systems mirror one another and where they diverge. For this reason, in this manuscript, we have focused on characterizing STB subtypes to provide a foundational understanding of the model and this poorly characterized subtype.

      (2) Introduction, Paragraph 3: What is the importance of orientation for the trophoblast TO model? The authors may consider removing some of the less important methodologic details from this paragraph and including more emphasis on why their TO model is an improvement. 

      Text has been added to this paragraph to highlight the importance of outward facing STB orientation, which is essential to mirror the STB’s transport function in vitro (lines 118-120).

      (3) Results, Figure 1: In addition to the primary placental tissue plots showing all cell populations, it may be useful to have side-by-side versions of similar plots showing only the trophoblast subsets, so that the primary and TO data could be more easily compared visually. 

      This has been implemented and added to the Supplemental Figure 1.4.

      (4) Results, Figure 1: In simple terms, what is the reason for ending up with different cluster numbers/names from the primary tissue and TO? Would it be possible to apply the same annotation to each (at least for trophoblast types) and thus allow direct comparison between the two? 

      As described above, each dataset was separately analyzed and clusters determined with an algorithm to determine the optimal clustering resolution. Therefore, the number of clusters between each dataset cannot be directly compared until the SN TO and tissue datasets are integrated together in Figure 6. We have added text to the manuscript to make it clear that they should not be compared except for in bulk number until this point (230-232).

      (5) Results, Figure 2: For subsequent evaluation of different in vitro TO conditions, did the authors use only SN sequencing because they wanted to focus on STB? Based on Figure 1, it seems some CTB subsets would be underrepresented if using only SN. Given that the authors look at both STB and CTB in their different TOs, is this an issue? 

      The CTB clusters that showed the greatest divergence between SC and SN datasets were those associated with mitosis and the cell cycle, likely due to nuclear envelope breakdown interfering with capture by the 10x microfluidics pipeline. While cytoplasmic gene expression provides valuable insights into CTB function, our manuscript focuses on the STB starting from Figure 2. Since the STB is captured exclusively by the SN dataset, we concentrated on this approach to streamline our analysis.

      (6) Results, Figure 3: What do the authors consider to be the primary contributing factors for why the STB subsets display differential gene expression between STB<sup>in</sup> and STB<sup>out</sup>? Is this due primarily to the cultural conditions and/or a result of the differing spatial arrangement with CTBs? 

      This is an intriguing question that is challenging to disentangle because the culture conditions are integral to flipping the orientation. The two primary factors that differ between STB<sup>in</sup> and STB<sup>out</sup> TOs are the presence of extracellular matrix in STB<sup>in</sup> and direct exposure to the surrounding media in STB<sup>out</sup>. We believe these environmental cues play a significant role in shaping the gene expression of STB subsets. Fully disentangling this relationship would require a method to alter the TO orientation without changing the culture conditions. While this is an exciting direction for future research, it falls outside the scope of the present study.

      (7) Results, Figure 4: The authors' analysis indicates that the STB nuclei from the STB<sup>out</sup> TO are likely "more differentiated" than those in STB<sup>in</sup> TO. Could the authors provide some qualitative or quantitative support for this? Is the STB<sup>out</sup> differentiated phenotype closer to what would be observed in a fully formed placenta? 

      As discussed earlier, we agree with the reviewers that this claim should be removed from the text outside of the discussion.

      (8) Results, Figure 5: Based on the trajectory analysis, do the authors consider that the STB from STB<sup>out</sup> TO are simply further along the differentiation pathway compared to those from STB<sup>in</sup> TO, or do the STB from STB<sup>out</sup> TO follow a differentiation pathway that is intrinsically distinct from STB<sup>in</sup> TO? 

      We think the idea of an intrinsically distinct pathway is a fascinating alternative hypothesis and have added it into the discussion. We do not find the pseudotime currently allows us to answer this question without additional experiments, so we have removed claims that the STB<sup>out</sup> STB nuclei are further along the differentiation pathway.

      (9) Results, Figure 6: A notable difference between the STB<sup>out</sup> TO and the term tissue is that the CTB subsets are much more prevalent. Is this simply a scale difference, i.e. due to the size of the human placenta compared to the limited STB nuclei available in the STB<sup>out</sup> TO? Or are there other contributing factors? 

      The proportion of CTB to STB nuclei in our term tissue (9:1) aligns with expectations based on stereological estimates. We believe the relatively low number of CTB nuclei in our dataset is due to the need for a larger sample size to capture more of this less abundant cell type. Since the primary focus of this paper is on STB, and we analyzed over 4,000 STB nuclei, we do not view this as a limitation. However, future studies utilizing SN to investigate term tissue should account for the abundance of STB nuclei and plan their sampling carefully to ensure sufficient representation of CTB nuclei if this is a desired focus.

      Reviewer #2 (Recommendations for the authors): 

      (1) The color annotations for cell types in Figure 2 are inconsistent between the different panels, and the term "Prolif" in Figure 2E is not explained by the authors. 

      We chose colors to enhance visibility on the UMAP. We do not wish readers to make direct comparisons between the different CTB or STB subtypes of the sample types until the datasets are integrated in Figure 2D. This is because an algorithm for the clustering resolution has been chosen independently for each dataset. Cluster proportions are better compared in the integrated datasets in Figure 2D. We have added text to the results section to make this clear to the reader (lines 166-167).

      (2) In Figure 3 and Supplementary Figures 1.3, the authors frequently present long non-coding RNA (lncRNA) signals as cell type-specific markers in the bubble plots. These signals are likely sequencing noise and may not accurately represent true markers for those cell types. It is recommended to revise this interpretation. 

      As referenced above, there are many examples of lncRNAs that have biological and pathological significance in the placenta (H19, Meg3, Meg8) and lncRNAs often have cell type specific expression that can enhance clustering. We prefer to keep these signatures included and let future researchers determine their biological significance.

      (3) In Figure 3C, the authors performed pathway enrichment analysis on the STB subtypes after integrating STB_in and STB_out organoids. The enrichment of the "transport across the blood-brain barrier" pathway in the STB-3 subtype does not align with the current understanding of STB cell function. Please provide corresponding supporting evidence. Additionally, please verify whether the other functional pathways represent functions specific to the STB subtypes. 

      Interestingly, many of the genes categorized under “transport across the blood-brain barrier” are transporters shared with “vascular transport.” These include genes involved in the transport of amino acids (SLC7A1, SLC38A1, SLC38A3, SLC7A8), molecules essential for lipid metabolism (SLC27A4, SLC44A1), and small molecule exchange (SLC4A4, SLC5A6). Given that the vasculature, the STB, and the blood-brain barrier all perform critical barrier functions, it is unsurprising that molecules associated with these GO terms are enriched in the STB-3 subtype, which expresses numerous transporter proteins. Since the transport of materials across the STB is a well-established function, we have not included additional supporting evidence but have clarified the genes associated with this GO term in the text (lines 392-394 and supplemental Table 9).

      (4) The pseudotime heatmap in Figure 4B is not properly arranged and is inconsistent with the differentiation relationships shown in Figure 4A. It is recommended to revise this. 

      We are uncertain which aspect of the heatmap in Figure 4A is perceived as inconsistent with Figure 4B. One distinction is that pseudotime in Figure 4A is normalized from 0 to 100 to fit the blue-to-yellow-to-red color scale, whereas in Figure 4B, the color scale is not normalized and the color bar ranging from white to red. This difference reflects our intent to simplify Figure 4B-C, as the abundance of color between cell types and gene expression changes required a streamlined representation to ensure the figure remained clear and easy to interpret. This is classically done in the field and consistent with the default code in the slingshot package.

      (5) In Figures 4C and 4D, although RYBP is highly expressed in STB, it is difficult to support the conclusion that RYBP shows the most significant expression changes. It is recommended to provide additional evidence. 

      The claim that RYBP exhibits the most significant expression changes was based on p-value ordering of genes associated with pseudotime via the associationTest function in slingshot and not with immunofluorescence data. The text has been revised to make this distinction clear (lines 390-393).

      (6) In Figure 4E, staining for CTB marker genes is missing, and in Figure 4F, CYTO is difficult to use as a classical STB marker. It is recommended to use the CGBs antibody from Figure 4E as a STB marker for staining to provide evidence.  

      We have revised the Figure 5B-C to use e-Cadherin as a CTB marker gene in TOs and CGB antibody as a marker of STB.

      In tissue, however, obtaining a good STB marker that does not overlap with the RYBP antibody (rabbit) in term tissue is difficult as the STB downregulates hCG expression closer to term to initiate contractions. SDC1 is often used but only labels the plasma membrane so does not help in distinguishing the STB cytoplasm. We have added an image of cytokeratin, e-Cadherin, and the STB marker ENDOU to validate that our current approach with e-Cadherin and cytokeratin allows us to accurately distinguish between CTB and STB cells.

      (7) The velocity results in Figure 5A do not align with the differentiation relationships between cells and contradict the pseudotime results presented in Figure 4 by the authors. 

      The reviewer raises an interesting observation regarding the velocity map in Figure 5A, which appears to show a bifurcation into two STB subtypes. This observation aligns with similar findings reported in tissue by our colleagues (Wang et al., 2024). However, given the low number of CTB cells in our tissue dataset, we were cautious about making definitive conclusions about pseudotime without a larger sample size. Notably, the RNA velocity map closely resembles the pseudotime trajectory in TOs, with CTB transitioning into the CTB-pf subtype and subsequently into the STB. One potential explanation for discrepancies between tissue and TOs is the difference in nuclear age: nuclei in tissue can be up to nine months old, whereas those in TOs are only hours or days old. It is possible that the lineage in TOs could bifurcate if cultured for longer than 48 hours, but our current dataset captures only the early stages of the STB differentiation process. While exploring these hypotheses is fascinating, they are beyond the scope of this current study.  

      Reviewer #3 (Recommendations for the authors): 

      Amazing work - I greatly enjoyed reading the manuscript. Here are a few questions and suggestions for consideration: 

      Evidence presented throughout the results sections hints that the organoids may represent an earlier stage of placental development compared to the term. Increased hCG gene expression is observed, but as noted expression is decreased in term STB. STB:CTB ratios are also higher at term compared to the first trimester, etc. It was difficult to conclude definitively based on how data is presented in Fig 6 and discussed. Maybe there is no clear answer. Perhaps the altered cell type ratios in the organoid models (e.g., few STB in EVT enrich conditions) impact recapitulation of the in vivo local microenvironment signaling. As such, can the authors speculate on whether cell ratios could be strategically leveraged to model different gestational time points? 

      Along these same lines, syncytiotrophoblast in early implantation (before proper villi development) is often described as invasive and later at the tertiary villi stage defined by hormone production, barrier function, and nutrient/gas exchange. Do the authors think the different STB subtypes captured in the organoid models represent different stages/functions of syncytiotrophoblast in placental development? 

      Minor Comments 

      (1) Please clarify what the third number represents in the STB:CTB ratio (e.g., 1:3:1 and 2:5:1). EVT? 

      The first number is a decimal point and not a colon (ie 1.3 and 2.5). Therefore these numbers are to be read as the STB:CTB ratio is 1.3 to 1 or 2.5 to 1.

      (2) Could consider co-localizing RYBP in term tissue with a syncytio-specific marker like CGB used for organoids (Fig 4F). 

      We addressed this concern in comment 6 to reviewer 2.

      (3) Recommend defining colors-which colors represent which module in Figure 5C in the legend and main body text. I see the labels surrounding the heatmap in 5B, but defining colors in text (e.g. cyan, magenta, etc.) would be helpful. Do the gray circles represent targets that don't belong to a specific module? Are the bolded factor names based on a certain statistical cutoff/defining criteria or were they manually selected? 

      The text of both the results and figure legends has been revised to clarify these points.

      (4) Data Availability: It would be helpful to provide supplemental table files for analyses (e.g., 5C to list the overlapping relationships in TGs for each TF/CR (5C) and 3E/6F to list DEG genes in comparisons). 

      Supplemental files for each analysis have been added (Supplemental Table 8-14). In addition, the raw and processed data is available on GEO and we have created an interactive Shiny App so people without coding experience can interact with each dataset (lines 917-919).

      (5) “...and found that each sample expressed these markers (Figure 6D), suggesting..." Consider clarifying "these". 

      Text has been added to refer to a few of these marker genes within the text (line 540).

      Citations

      (1) Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):giy083. PMCID: PMC6057528

      (2) Zhou J, Xu J, Zhang L, Liu S, Ma Y, Wen X, Hao J, Li Z, Ni Y, Li X, Zhou F, Li Q, Wang F, Wang X, Si Y, Zhang P, Liu C, Bartolomei M, Tang F, Liu B, Yu J, Lan Y. Combined Single-Cell Profiling of lncRNAs and Functional Screening Reveals that H19 Is Pivotal for Embryonic Hematopoietic Stem Cell Development. Cell Stem Cell. 2019;24(2):285-298.e5. PMID: 30639035

      (3) Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers. 2024;16(7):1350. PMCID: PMC11011054

      (4) Adu-Gyamfi EA, Cheeran EA, Salamah J, Enabulele DB, Tahir A, Lee BK. Long non-coding RNAs: a summary of their roles in placenta development and pathology†. Biol Reprod. 2023;110(3):431–449. PMID: 38134961

      (5) Zheng M, Hu Y, Gou R, Nie X, Li X, Liu J, Lin B. Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation. Biomed Pharmacother. 2020;124:109810. PMID: 32000042

      (6) Khan T, Seetharam AS, Zhou J, Bivens NJ, Schust DJ, Ezashi T, Tuteja G, Roberts RM. Single Nucleus RNA Sequence (snRNAseq) Analysis of the Spectrum of Trophoblast Lineages Generated From Human Pluripotent Stem Cells in vitro. Front Cell Dev Biol. 2021;9:695248. PMCID: PMC8334858

      (7) Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci United States Am. 2021;118(51):e2113568118. PMCID: PMC8713755

      (8) Morales-Vicente DA, Zhao L, Silveira GO, Tahira AC, Amaral MS, Collins JJ, Verjovski-Almeida S. Singlecell RNA-seq analyses show that long non-coding RNAs are conspicuously expressed in Schistosoma mansoni gamete and tegument progenitor cell populations. Front Genet. 2022;13:924877. PMCID: PMC9531161

      (9) Kim DH, Marinov GK, Pepke S, Singer ZS, He P, Williams B, Schroth GP, Elowitz MB, Wold BJ. Single-Cell

      Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming. Cell Stem Cell. 2015;16(1):88–101. PMCID: PMC4291542

      (10) Yang L, Liang P, Yang H, Coyne CB. Trophoblast organoids with physiological polarity model placental structure and function. bioRxiv. 2023;2023.01.12.523752. PMCID: PMC9882188

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      • *

      *We thank the reviewers for finding the manuscript enjoyable and well-written, with experiments that were performed well, show solid results and provide useful data for the community. The reviewers have provided meaningful feedback to improve this study. We have addressed the comments point-by-point below. The main text will also be further modified to incorporate new analysis where it has not yet been done. *

      • *

      • *

      2. Description of the planned revisions

      Reviewer 1:

      Summary OTX2 is a pivotal transcription factor that regulates the fate choice between somatic and primordial germ cell (PGC) lineages in early mouse development. In the current study, the authors use in vitro stem cell models to demonstrate that OTX2 mediates this developmental fate decision through controlling chromatin accessibility, whereby OTX2 helps to activate putative enhancers that are associated with somatic fate. By extension, those somatic-associated regulatory regions therefore become inaccessible in cells adopting PGC identity in which Otx2 is downregulated.

      Comments I enjoyed reading this manuscript. The experiments have been carried out well and for the most part the results provide convincing evidence to support the claims and conclusions in the manuscript. I particularly liked the experiments using the inducible Otx2 transgene to examine the acute changes in chromatin accessibility following restoration of OTX2.

      I include some suggestions below to the authors for additional analyses that I feel would further strengthen their study.

      I also felt that the authors focus almost exclusively on the subset of OTX2-bound sites that lose accessibility in the absence of OTX2. But, as they show in several figure panels, these sites tend to be the minority and that most OTX2-occupied sites do not lose accessibility in Otx2-null cells (actually, more sites tend to gain accessibility). I encourage the authors to modify the text and some of the analyses to give a better balance to their study. We are pleased that this reviewer enjoyed our manuscript. As suggested by the reviewer, we included analyses on the regions that are bound by OTX2 but do not show an increase in accessibility (see section 3 reviewer 1 point 6). The text will be expanded to include the new data and to include the description of the subset of OTX2 sites that do not show accessibility changes in the absence of OTX2. We have responded to other points they raised as detailed in the sections below

      • *

      Figure 1: The authors write: "...OTX2 binds mostly to putative enhancers." Whether these distal sites are enhancers is not sufficiently evidenced in the manuscript, but it is important information to collect to support their model of OTX2 function. The authors should strengthen their analysis by examining whether OTX2 peaks are enriched at previously defined enhancer regions.

      We plan to compare OTX2 bound regions with defined lists of enhancers identified in ESCs grown in Serum/LIF (e.g. Whyte et al 2013) and, if available, in 2i/LIF and EpiLCs. We will also analyse publicly available datasets for H3K4me1 (enhancer marker) and H3K27ac (marker of active regulatory regions) at the regions bound by OTX2 in ESCs and EpiLCs.

      Figure 2: I'm still puzzled why the authors did not examine flow-sorted WT+cyto cells?

      *We agree with the reviewer that it would be interesting to examine flow-sorted WT +cyto PGCLCs. Unfortunately, the expression of CD61 and SSEA1 only becomes visible from day 4 of PGCLC differentiation. Therefore, we were not able to isolate PGCLC at day 2 from WT cells differentiated in the presence of cytokines. We then used OTX2-/- cells at day 2 to model PGCLCs. This is based on the assumption that because day 6 Otx2-/- PGCLCs are transcriptionally similar to sorted day 6 WT cells (Zhang, Zhang et al Nature 2018), the same will be true at day 2. We will modify the text in the final version of this manuscript to clarify this point that has also been raised by reviewers 2 and 3. *

      • *

      Figure 3: I would be tempted to put Figure S3A and S3B into Figure 3. It would be better to show all 1246 DARs together, either ordered by OTX2 CT&RUN signal, or presented in two pre-defined groups (OTX2-bound vs unbound). I also suggest that the author show OTX2 signals and ATAC-seq signals for the 3028 DARs that gain accessibility in Otx2-null EpiLCs (this could be added to a supplemental figure).

      Although the analysis has been carried out and the figures have been amended, the main text will be modified in a future updated version of the manuscript to incorporate these results.

      • *

      Figure 3: What is special about the 8% of OTX2-bound site that lose accessibility, versus the 92% of sites that do not?

      *The 8% of the OTX2-bound regions that lose accessibility in the absence of OTX2 appear to be more sensitive to the loss of OTX2. One possible explanation is that the accessibility of the rest of OTX2 bound regions relies on other TFs, such as OCT4, that are expressed in EpiLCs. We will modify the main text to discuss this interesting point raised by the reviewer. *

      Figure 6F: If the 4221 sites are split into those bound by OTX2 versus those that are not (related to Figure 6C) then is there a difference? i.e. are the OTX2-bound sites opening up?

      We separated the 4,221 sites in OTX2 bound and unbound. The result is reported below:

      *Although there is a slight increase in accessibility in the OTX2 bound subset, the average accessibility reaches less than ¼ of the accessibility of these regions when OTX2 is present from day 0 to day 4, while the OTX2 unbound regions do not show an increase in accessibility. Although we can not rule out that a longer treatment with tamoxifen may lead to higher accessibility in the OTX2 bound subset, the dynamics are extremely slower compared to the EpiLC regions where accessibility reaches 50% of the d0-d4 sample in just 1 hour of tamoxifen treatment. *

      • *

      Is there any evidence that OTX2 binds and compacts PGCLC enhancers in somatic cells? I appreciate this is different to the main thrust of the authors' model, but being able to show that OTX2 does not compact these sites lends further support to their preferred model of OTX2 opening sites of somatic lineages.

      *Comparing the ATAC-seq in PGCLCs with ESCs and EpiLCs, we identify a subset of regions that are open in PGCLC only (PGCLC-specific accessible regions, see below). These regions do not show binding of OTX2 in WT EpiLCs or the d0-d2 Tam sample, suggesting that OTX2 does not bind and compact PGCLC-specific enhancers. *

      • *

      PGCLC-specific regions showing high accessibility only in PGCLCs.

      • OTX2 CUT&RUN signal in WT EpiLC, OTX2-ERT2 PGCLCs in presence or absence of Tamoxifen, showing that OTX2 does not bind PGCLC-specific regions even when it is overexpressed in GK15 medium.*

      *These analyses will be incorporated in the manuscript. *

      • *

      Discussion: Have prior studies established a connection between OTX2 and chromatin remodellers that can open chromatin? Or, if not, then perhaps this could be proposed as a line of future research.

      We thank the reviewer for suggesting to amplify the discussion on the possible connection between OTX2 and chromatin remodellers. Although there is no evidence in the literature of a direct interaction between OTX2 and chromatin remodellers, this can not be excluded. The connection might also be indirect: OTX2 is known to interact with OCT4, which in turn interacts and recruits to chromatin the catalytic subunit of the SWI/SNF complex, BRG1. This point will be discussed in a modified version of the manuscript.

      • *

      • *

      Reviewer 2:

      Barbieri and Chambers explore the role of OTX2 on mouse pluripotency and differentiation. To do so, they examine how the chromatin accessibility and OTX2 binding landscape changes across pluripotency, the exit of pluripotency towards formative and primed states, and through to PGCLC/somatic differentiation. The work mostly represents a resource for the community, with possible implications for our understanding of how OTX2 might mediate the germline-soma switch of fates. While the findings of the work are modest, the results seem solid and the manuscript is clear and well-written.

      *We are pleased that this reviewer found our results solid and the manuscript clear. *

      I have some comments as indicated below:

      1. The comparison between Otx2-/- cells in the presence of PGCLC cytokines compared to WT cells in the absence of cytokines seems like it is missing controls to me. I assume the authors wanted to enable homogeneous populations to facilitate their bulk sequencing methods, but it seems to me like they are comparing apples with oranges. It would have been better to have the reciprocal situations (Otx2-/- cells in basal differentiation medium, and WT cells in PGCLC cytokines) with a sorting strategy to better unpick the differences between the presence and absence of Otx2 in the 2 protocols. Having said that, the authors are careful not to draw many comparisons between those populations so I don't think this omission affects their current claims. They should however clarify whether the flow cytometry (Supp Fig2) was used for sorting cells or if all cells were taken for bulk sequencing. *We agree with the reviewer that it would be of interest to compare the PGCLC and somatic population derived from the OTX2-/- cells in GK15 without cytokines with the same populations derived from WT cells differentiated in the presence of cytokines. Our work aims to identify what happens at the stages of PGCLC differentiation when cells are still competent for both germline and somatic differentiation. Previous work from the lab showed that this dual competence is lost after day 2, therefore we focus our attention on this time of differentiation. Unfortunately, the two surface markers characteristics of PGCs (CD61 and SSEA1) are not expressed at day2 and, therefore we are not able to sort PGCLCs derived from OTX2-/- cells in GK15 without cytokines or WT cells differentiated in the presence of cytokines. As recognised by this reviewer, we aimed to obtain two homogenous populations that can model PGCLCs and somatic cells. This is based on data obtained at day 6 when Otx2-/- PGCLCs show a similar transcriptome to sorted day 6 WT cells (Zhang, Zhang et al Nature 2018) and the assumption that the same will be true at day 2. We will clarify that the supplementary Figure 2 is not a sorting strategy. As this point has been raised by reviewers 1 and 2 as well, we will modify the text to clarify the choice and the assumption behind using OTX2-/- cells in the presence of cytokines and WT cells in the absence of cytokines to model PGCLCs and somatic cells respectively. *

      2. *

      Throughout the text, the authors subject cells (WT / Otx2-/- /Otx2ER ) to different protocols to look at accessibility and Otx2 binding, but with no mention of the cell fate differences that occur in these different conditions. For instance, it is unclear to me to which fate the WT cells without PGCLC cytokines go - I presume this is neural but perhaps this is a mixed fate, given that they are in GK15 rather than N2B27. Likewise, the OTX2ER experiments may promote a mixed population between PGCLC/somatic fates, and this is never described. Ideally transcriptomic data would be collected, but failing that, qPCR data should be obtained to examine this more closely.

      *We are planning to generate RT-qPCR data for germ layer markers (ectoderm, endoderm and mesoderm) in WT cells in GK15 without cytokines at day 2, as well as OTX2-ERT2 cells with and without Tamoxifen at day 2 (noTam, d0-d2) and day 4 (no Tam, d0-d4). *

      The authors also state that "OTX2 facilitates Fgf5 transcription' (page10) but provide no transcriptional data to substantiate this claim. Again RT-qPCR would help make this point.

      *We will analyse the level of Fgf5 by RT-qPCR in OTX2-ERT2 EpiLCs treated for 1 hour and 6 hours with Tamoxifen to show the effect of OTX2 on Fgf5 transcription. *

      • *

      It is unclear to me what the 'increase[d] accessibility' (eg abstract final sentence, Figure 3E) really means at the cellular level. Does this indicate that more cells have this site open, and does this have implications for the heterogeneity of cell fates observed? Since the authors are concerned with fate decisions, this seems like an important consideration that should at least be discussed.

      The possibility that the increased accessibility is due to higher heterogeneity in the population is interesting and it will be included in the discussion in a revised version of the manuscript.

      • *

      • *

      Reviewer 3:

      In this manuscript, the authors perform OTX2 CUT&RUN and ATAC-seq in Otx2-null and WT ESCs, EpiLCs and PGCLCs to understand whether the role of OTX2 in restricting mouse germline entry that they previously described (Zhang Nature 2018) mechanistically depends on chromatin remodeling. They identify differentially accessible regions (DARs) between Otx2-null and WT cells at different stages of differentiation and show that many of these are OTX2 bound in WT. They then show using cells expressing OTX2-ER^T2 in Otx2-null Epiblast cells that when OTX2 is moved into the nucleus, the regions that were differentially closed in Otx2-null open within an hour, suggesting chromatin accessibility is directly controlled by OTX2 (rather than indirect effects involving transcription and translation which one would expect to take longer). The scope is narrow, but this is nice work and useful data for the mouse PGC field. However, there are a few places where the data could be strengthened, and the writing is a little confusing in places, for example by stating as fact in early sections what is not proven until later.

      We thank the reviewer for finding our work nice and useful for the mouse PGC field, and for the useful comments to improve the manuscript. We have included new analysis and modified the text as suggested to improve the writing, avoiding early statements that were not fully proven until later in the manuscript. We have responded to other points they raised as detailed below and in the next section.

      • *

      1) "we compared Otx2-/- cells cultured in the presence of PGC-promoting cytokines with wild-type cells cultured in the absence of PGC-promoting cytokines. Under these conditions Otx2-/- cells produce an essentially pure (>90%) CD61+/SSEA1+ population that we refer to as PGCLCs, while wild-type cells yield a cell population from which PGCLCs are absent"

      This is not a controlled comparison since one cannot separate the day 2 effect of cytokines from that of the Otx2 knockout. The manuscript would be strengthened if the authors include WT somatic and PGCLCs from the +cytokine conditions, which could be easily sorted out as shown in Supp. Fig. 2. Ideally they would also include Otx2-null somatic cells, although Supp. Fig. 2 shows those are rare under the conditions considered.

      *This work aimed to analyse early stages of EpiLC to PGCLC differentiation when cells are still competent for both somatic and germline differentiation. This stage has been described previously to be at day 2 of differentiation in GK15 + cytokines (PGCLC differentiation medium, Zhang, Zhang et al, Nature 2018). Unfortunately, CD61 and SSEA1 are not expressed at day 2 of PGCLC differentiation, and they start to be expressed on the cell surface by day 4. Consequently, it is impossible to sort cells at day 2 using the CD61+/SSEA1+ strategy. To overcome this problem, we used WT cells grown in GK15 without cytokines to model a population of somatic cells and OTX2-/- cells grown in GK15+ cytokines to model a homogeneous population of PGCLCs. As explained in a similar point raised by reviewers 2 and 3, we assumed that, as OTX2-/- cells grown in the presence of cytokines are transcriptionally similar to sorted WT cells at day 6 (Zhang, Zhang et al, Nature 2018), OTX2-/- cells at day 2 are similar to their WT counterpart at day 2. The main text will be modified to clarify that we are using homogeneous populations to model both PGCLC and somatic cells and that Figure S2 does not show a sorting strategy. *

      • *

      3) "In ESCs, OTX2 binds We are planning to perform a statistical analysis to ascertain that the small number of DARs bound by OTX2 are or are not bound by chance by OTX2.

      • *

      4) It would be good if the discussion was broadened to include both human and other transcription factors that are involved. How much of these conclusions could one expect to carry over to human or other mammals? There is some work from the Surani lab considering OTX2 in human. One could even look at published ATAC or OTX2 chip-seq data in hPSCs and potentially learn something interesting. Furthermore, there are studies on other transcription factors modulating chromatin accessibility in the decision between germline and somatic cells, for example PRDM1, PRDM14 (refs in e.g. Tang et al Nat Rev Gen 2016) or TFAP2A (at least in human (Chen et al Cell Rep 2019)). Do these factors affect the same genes? Is a coherent picture emerging of their respective roles in germline entry?

      *As suggested by the reviewer, we will discuss the role of OTX2 in human PGCLC formation and include studies on PGC-specific transcription factors concerning changes in chromatin accessibility in germline and somatic cells. This will be included in a revised version of the manuscript. *

      • *

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer 1:

      1. Figure 1: The authors report in the methods that they performed OTX2 CUT&RUN in biological duplicates. It would strengthen their results if they showed in Figure S1 some representative data from each replicate separately to show the consistency. As suggested by the reviewer to show consistency between replicates, two representative tracks of the two CUT&RUN replicates at the Tet2 (ESCs) and Fgf5 (EpiLCs) loci have been included in Figure S1A. The corresponding tracks of the average bigwig files are reported in Figure 1E. The main text (page 5) and the figure legends have been amended to incorporate the new panels.

      2. *

      Figure 2: I think it would be helpful to remind the reader here that Otx2 is normally downregulated in PGCs, and that Otx2 expression is maintained (at least initially) in somatic cells. This would help explain the logic behind the choice of samples that were profiled.

      We modified the text with the following sentence, as suggested by the reviewer, emphasising the level of OTX2 in early somatic vs early PGCLCs: “Otx2 expression is rapidly downregulated in the EpiLC to PGCLC transition while its expression is maintained longer in cells entering the somatic lineage [8]*” (page 7). *

      • *

      Figure 2D: I appreciate that the highlighted region at the Tet2 locus is a DAR, but from the genome tracks it looks as though the region still has high accessibility. Are there any other examples to exemplify a more obvious DAR? Additionally, since twice as many DARs gain accessibility in Otx2-null ESCs compared to lose accessibility, why not show examples of these as well? The same is true of EpiLCs. (Or alternatively, provide a good explanation for why not to show these other categories)

      We substituted the Tet2 DAR with a more clear example of ESC DAR located in the Hes1 locus that shows low accessibility in Otx2-/- ESCs versus WT ESCs. Examples of ESC DARs and EpiLC DARs that show higher accessibility in Otx2-/- vs WT cells have been added as new panels 2E (DAR in Pebp4 locus) and 2G (DAR in Tdh locus). We also simplified the panels showing only ATAC-seq tracks in WT and OTX2-/- cells, either ESCs (2D-E) or EpiLCs (G-H). Text and figure legends have been modified to accommodate the changes made in Figure 2.

      • *

      Figure 3: I would be tempted to put Figure S3A and S3B into Figure 3. It would be better to show all 1246 DARs together, either ordered by OTX2 CT&RUN signal, or presented in two pre-defined groups (OTX2-bound vs unbound). I also suggest that the author show OTX2 signals and ATAC-seq signals for the 3028 DARs that gain accessibility in Otx2-null EpiLCs (this could be added to a supplemental figure).

      Figures S3A and S3B have been moved to the main figure. Figure S3A is now part of Figure 3C, where all the 1,246 DARs are shown together, separated into two groups (OTX2-bound and -unbound). Figure S3B is now part of Figure 3F. A new heatmap showing the OTX2 and ATAC-seq signals for the 3028 regions that gain accessibility in Otx2-/- EpiLCs has been added as new Figure S3B. Only 28 out of the 3,028 regions overlap an OTX2 peak as shown in the new Figure S3A. These regions appear to be already open in ESCs (Figure S3C) and they do not fully close when OTX2 is absent. This can be explained by either a) the lack of expression of an OTX2 target gene that represses these regions or b) the continuous expression of a gene that is usually repressed by OTX2 in the transition to EpiLCs. In both cases, OTX2 does not directly repress these regions. Figure legends have been amended to incorporate the new panels. The main text will be modified to incorporate these results.

      • *

      Figure 6: Do the PGCLCs with OTX2 expression have chromatin accessibility profiles similar to somatic cells? Consider adding WT somatic cell data to Figure 6A, which could be an interesting comparison with the Tam d0-d2 samples.

      *The heatmap showing the ATAC-seq signal at the additional OTX2-induced regions in somatic cells has been added to Figure 6A. The data show that the regions induced by OTX2 are not open in somatic cells generated in GK15. One possible explanation is the overexpression of OTX2 induces the opening of neural-associated regions, but neural differentiation is not fully supported in GK15 medium (see reviewer 2, point 3). As suggested by reviewer 2, we will perform RT-qPCR of germ layer markers to analyse the identity of somatic cells grown in GK15 (without cytokines) and somatic cells induced by OTX2 overexpression. *

      • *

      • *

      • *

      Reviewer 2:

      The authors focus solely on the activating role of Otx2 in their data, but given the substantial proportion of DARs that decrease following Otx2 depletion, I presume it is possible that it also has a repressive effect? Either way, this should be discussed.

      *As also suggested by reviewer 1 (point 6), we analysed the accessibility level and the OTX2 signal at the 3,028 regions that gain accessibility in Otx2-/- EpiLCs (new Figure S3A-C). These regions show high accessibility in ESCs suggesting that these are ESC regions that do not close properly in the transition to EpiLCs in the absence of OTX2. OTX2 CUT&RUN show a low to absent signal at these regions, with just 28 regions overlapping EpiLCs DARs that show higher accessibility in Otx2-/- cells, suggesting that OTX2 does not have a direct suppressive effect on them. *

      • *

      The authors state that d2 PGCLCs "show an intermediate position between ESCs and EpiLCs" based on the PCA location. They should be careful to qualify that this is only in the first 2 principal components, because it may well be the case (and is likely) that in other components the PGCLC population is far removed from the pluripotent states.

      • The text has been updated as follows: d2 PGCLCs “show an intermediate position between ESCs and EpiLCs on both PC1 and PC2”.*

      • *

      Reviewer2 Minor Suggestions:

      1. Presumably the regions bound by OTX2 in Tet2, Mycn and Fgf5 (Fig1E) are called enhancers because these are known from existing literature. It would be helpful to cite the relevant references to this in the text for those unfamiliar with these. References (Whyte et al, Cell, 2018 – Tet2 and Mycn, Buecker et al, Cell Stem Cell, 2013, Thomas et al, Mol Cell 2021 – Fgf5) have been added to the text and the figure legends.

      On page 13, the authors say "To determine whether OTX2 expression is essential to maintain chromatin accessibility in somatic cells..." but this does not seem to be what they test because they are using PGCLC medium. Perhaps I misunderstood, but this could be clarified.

      *Expression of OTX2 during the first 2 days of PGCLC differentiation leads to a block of germline differentiation as previously shown in Zhang, Zhang et al, Nature 2018. After 2 days of tamoxifen treatment, cells have acquired somatic fate and cells will undergo somatic differentiation even after tamoxifen is withdrawn after day 2. Nevertheless, we agree with the reviewer that the sentence is of difficult interpretation and we modified the sentence as shown below and as reported in the updated manuscript: “To determine whether OTX2 expression is essential to maintain chromatin accessibility in cells differentiating in the presence of PGC-inducing cytokines after day 2” (page 12). *

      On page 14 the authors claim, "These results indicate that...the partner proteins that OTX2 act alongside differ...". While this may be the case, their results do not substantiate this, it is just speculation. Should be toned down.

      The text has been modified as follows: "These results suggest that...the partner proteins that OTX2 act alongside differ..."

      Page 18, PGCLC differentiation method sections needs to be described as such (ie. Add "For PGCLC differentiation..." before the second paragraph)

      *The text “For PGCLC differentiation” has been added at the beginning of the PGCLC differentiation method section. *

      It would be helpful to indicate time on the protocol schematics (eg Fig4A, 5A, 5D etc) as I had to keep checking the methods to find out how long the full differentiation time-course was.

      *Indication of time has been added to Figures 1, 2, 4, 5 and 6. *

      Since the authors compare between the Tam d0-d2 treatments assessed at d2 versus d4 (Figure5B vs 5E) it would be helpful to make the colourbars the same scale, for both ATAC and Cut&Run datasets.

      *The heatmap in Figure 5B has been modified. The colourbars of Figure 5B and 5E are now using the same scale. *

      • *

      Reviewer 3:

      1) As a minor point related to this, the second sentence is confusing since it kind of sounds like Otx2-/- and WT cells are compared under the same conditions unless one carefully reads the previous sentence.

      The text has been modified to clarify the different medium conditions for WT and OTX2-/- cells, as follows: “In the presence of PGC-inducing cytokines, Otx2-/- cells produce an essentially pure (>90%) CD61+/SSEA1+ population that we refer to as PGCLCs, while wild-type cells differentiated in GK15 medium without cytokines yield a cell population from which PGCLCs are absent” (page 7).

      • *

      2) "This suggests that OTX2 acts as a pioneer TF to regulate the accessibility of enhancers E1, E2 and E3."

      This is from the text corresponding to Fig. 2. That data actually only shows that Otx2-null cells have DARs, so somehow OTX2 affects chromatin accessibility but it could be indirect by controlling transcription of genes that modify chromatin accessibility. It is not until figure 4 that the data suggests that OTX2 directly affects accessibility, perhaps as a pioneer TF.

      The authors continue to make many statements about the direct action of OTX2 before the data supporting this is shown, on which I got hung up as a reader. I suggest the authors edit the manuscript to improve this. E.g. "OTX2 may directly control accessibility at these sites (Figure 3E)." and the fact that in 3E and other figure, it says "DARs increased by OTX2 binding" which at that point is not proven, so would better say "Otx2-null vs WT DARs" or something like that.

      The sentence "This suggests that OTX2 acts as a pioneer TF to regulate..” has been removed from the text (page 9). The sentence “OTX2 may directly control accessibility at these sites” has been modified with “*suggesting that the presence of OTX2 affects accessibility at these sites” (page 9). The sentence “ Together, these results suggest that OTX2 is required to open these chromatin regions” has been modified to “Together, these results suggest that OTX2 is required for the accessibility of these chromatin regions”. *

      The subset of DARs that increase in WT EpiLC and are bound by OTX2 that was called “DARs increased by OTX2 binding” has been renamed as “DARs higher in WT with OTX2 binding”. For consistency, the subset of DARs showing increased accessibility in WT EpiLCs that are not bound by OTX2 are now called “DARs higher in WT without OTX2 binding” (Figure 3, Figure 4, main text and figure legends). We will further revise the manuscript to avoid statements or hypotheses that are not yet supported by data throughout the text.

      • *

      Reviewer 3 – minor comments:

      1) "Comparing wild-type and Otx2-/- ESCs identified 375 differentially accessible regions (DARs) with increased accessibility in wild-type cells, and 743 regions with higher accessibility in Otx2-/- ESCs (Figures 2C). An example of ESC DARs where accessibility is increased in cells expressing OTX2 is the intragenic enhancer of Tet2. Tet2 is expressed at high levels in ESCs but at low levels in EpiLCs."

      The authors compare Otx2-null and WT ESCs then proceed to give an example comparing ESCs to EpiLCs, instead of Otx2-null vs WT ESCs, which is confusing.

      Furthermore, here and in other places the authors describe ESCs as not expressing OTX2. However, they also show CUT&RUN data for OTX2 in ESCs etc, clearly indicating that it is expressed, just lower (otherwise how could one get anything?).

      *We originally chose Tet2 enhancer as an example of the 375 ESC DAR with higher accessibility in WT vs Otx2-/- ESCs as it shows a slightly decreased level of accessibility and OTX2 binding in ESCs. Therefore, the sentence “where accessibility is increased in cells expressing OTX2” refers to WT cells (expressing OTX2) when compared to Otx2-/- cells (OTX2-null). The text has been changed to describe the new panel. The rest of the main text will be checked and modified where appropriate to avoid possible misinterpretations. *

      *We also appreciate that the change in accessibility is not clearly visible in the original Figure 2, as also pointed out by Reviewer 1 (point 6). In the updated Figure 2, we show a region in the Hes1 locus as an example of the 375 ESC DARs. Moreover, we simplified the panels showing ATAC-seq tracks of WT and OTX2-/- ESC (Fig. 2D-E) or EpiLCs (Fig. G-H). *

      2) "In contrast, in EpiLCs, OTX2 binds almost 40% (446 out of 1,246) of the DARs that are more accessible in wild-type than in Otx2-/- cells (Figure 3B-C). Notably, these regions are mainly located distal to genes (91%, Figure 3D), despite the increased fraction of promoter regions bound by OTX2 in EpiLCs (Figure S1A)."

      Are the authors rounding percentages with 2 significant digits, as suggested by the "91%"? If so, 446/1245 ~ 36%, not 40%.

      *The text has been modified from “OTX2 binds almost 40%” to “OTX2 binds 36%”. *

      3) The results in Figure 4 are nice and the real meat of the paper. One suggestion: It would be helpful is Fig. 4B were split up between the 446 and 800 genes instead of showing all 1246, and if the WT control was shown in the same figure as well.

      *Panels with the 446 and 800 regions have been added to Figure 4 instead of the panels with all 1246 regions. WT control has been inserted in Figure 4. The main text and the figure legends have been updated accordingly. *

      4) "Enforced OTX2 expression opens additional somatic regulatory regions" - it would be clearer to say "OTX2 overexpression opens additional somatic regulatory regions", since this is really about DARs between EpiLCs that already express OTX2 and those forced to express higher than WT endogenous levels by the OTX2-ER system?

      We thank the reviewer for their suggestion. The text has been modified (page 12)

      • *

      • *

    1. Author response:

      General Statements

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      (1) Description of the planned revisions

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      (2) Description of the revisions that have already been incorporated in the transferred manuscript

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      •  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      •  On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      •  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      •  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      •  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      •  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      •  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      •  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      •  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      •  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively.

      •  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      •  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      •  p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      •  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      •  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      •  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      •  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. it has been corrected.

      •  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      •  Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      •  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      •  Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      •  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      •  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      (3) Description of analyses that authors prefer not to carry out

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study investigates the relationship between replication timing (RT) and transcription. While there is evidence that transcription can influence RT, the underlying mechanisms remain unclear. To address this, the authors examined a single genomic locus that undergoes transcriptional activation during differentiation. They engineered the Pln locus by inserting promoters of varying strengths to modulate transcription levels and assessed the impact on replication timing using Repli-seq. Key Findings: • Figure 1C and 1D: The data show that higher transcription levels correlate with an advanced RT, suggesting that transcriptional activity influences replication timing. • Figure 2: To determine whether transcription alone is sufficient to alter RT, the authors inserted an hPGK reporter at different genomic locations. However, given the findings in Figure 1, which suggest that this is not the primary mechanism, • Figure 3: The authors removed the marker to examine whether the observed effects were due to the promoter-driven Pln locus, which has significantly larger then the marker. • Figure 4: The study explores the effect of increased doxycycline (Dox) treatment at the TRE (tetracycline response element), further supporting the role of transcription in RT modulation. • Figure 5: The findings demonstrate that Dox-induced RT advancement occurs rapidly, is reversible, and correlates with transcription levels, reinforcing the hypothesis that transcription plays a direct role in influencing replication timing. • Figure 6. Shows that during differentiation transcription of Pln is not required for RT advancement.

      Overall, the study presents a compelling link between transcription and replication timing, though some experimental choices warrant further clarification. I have no major comments.

      __Minor Comments: __Overall, the results are convincing, and the study appears to be well-conducted. In Figure 2, the authors use the hPGK promoter. However, it is unclear why they did not use the constructs from the previous experiments. Given that the hPGK promoter did not advance RT in Figure 1, the results in Figure 2 may not be entirely unexpected.

      We took advantage of previously published cell lines using a PiggyBac Vector designed to pepper the reporter gene at random sites throughout the genome; the point of the experiment was to acquire supporting evidence for the hypothesis that any vector with its selectable marker driven by the hPGK promoter will not advance RT no matter where it is inserted. Since there are reports concluding that transcription per se is sufficient to advance RT, it was important to confirm that there was nothing unique about the particular vector or locus into which we inserted our panel of vectors.

      ACTION DONE: We have now added the following sentence to the results describing this experiment: “____By analyzing RT in these lines, we could evaluate the effect of a different hPGK vector on RT when integrated at many different chromosomal sites. “

      Additionally, the study does not formally exclude the possibility that Pln protein expression itself influences RT. In Figure 1, readthrough transcription at the Pln locus could potentially drive protein expression. It would be useful to know whether the authors address this point in the discussion.

      NOT DONE FOR NEED OF CLARIFICATION: It is unclear why a secreted neural growth factor would have a direct effect on replication timing in embryonic stem cells and, in particular, only in cis (remember there is a control allele that is unaffected). We would be happy to address this in the Discussion if we understood the reviewers’ hypothesis. We cannot respond to this comment without understanding the hypothesis being tested as we do not know how a secreted protein could affect the RT of one allele without affecting the other.

      Regarding the mechanism, if transcription across longer genomic regions contributes to RT changes, transcription-induced could DNA supercoiling play a role. For instance, could negative supercoiling generated by active transcription influence replication timing?

      Yes, many mechanisms are possible.

      ACTION DONE: ____We have added the following sentence to the discussion, referencing a seminal paper on that topic by Nick Gilbert: “ ____For example, long transcripts could remodel a large segment of chromatin, possibly by creating domains of DNA supercoiling (Naughton et. al., 2013____).____”

      It remains puzzling why Pln transcription does not contribute to replication timing during differentiation. Is there any evidence of chromatin opening during this process? For example, are ATAC-seq profiles available that could provide insights into chromatin accessibility changes during differentiation?

      We thank the reviewer for asking this as we should have mentioned something very important here. Lack of necessity for transcription implies that independent mechanisms are functioning to elicit the RT switch. In other work (Turner et. al., bioRxiv, provisionally accepted to EMBO J.), we have shown that specific cis elements (ERCEs) can function to maintain early replication in the absence of transcription.

      ACTION DONE: We now explicitly state in the Discussion: “____This is not surprising, given that ERCEs can maintain early RT in the absence of transcription (Turner, bioRxiv).”

      ACTION TO BE DONE SOON: We will provide a new Figure 6D showing ATAC-seq changes upon differentiation of mESCs to mNPCs and their location relative to the promoter/enhance deletion. As you will see, there is an ATAC-seq site that appears during differentiation, upstream of the deletion. We will hypothesize in the revised manuscript that these are the elements that drive the RT switch and that future studies need to investigate that hypothesis. We have also added the following sentences to the discussion after the sentence above, stating: “____In fact, new sites of open chromatin, consistent with ERCEs appear outside of the deleted Ptn transcription control elements after differentiation (soon to be revised Figure 6D). The necessity and sufficiency of these sites to advance RT independent of transcription will be important to follow up.”

      We also have preliminary data that are part of a separate project in the lab so they are not ready for publication, but are directly relevant to the reviewer’s question. This data shows evidence for a region upstream of the Ptn promoter/enhancer deletion described in Figure 6 that, when deleted, DOES have an effect on the RT switch during differentiation. This deletion overlaps an ATAC-seq site we will show in the new figure 6D.

      Reviewer #1 (Significance (Required)):

      This is a compelling basic single-locus study that systematically compares replication timing (RT) and transcription dynamics while measuring several key parameters of transcription.

      My relevant expertise lies in transcriptional regulation and understanding how noncoding transcription influences local chromatin and gene expression.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In the manuscript entitled: Transcription can be sufficient, but is not necessary, to advance replication timing", the authors use as they state a "reductionist approach" to address a long-standing question in the replication field on what level the process of transcription within a replication domain can alter the underlying replication timing of this domain. The authors use an elegant hybrid mouse embryonic stem cell line to discriminate the two allelic copies and focus on a specific replication domain harboring the neuronal Ptn gene that is only expressed upon differentiation. The authors first introduce four different promoters in the locus upstream of Ptn gene that drive expression of small transgenes. Only the promoters with highest transcriptional induction could advance RT. If the promoters are placed in such a way that they drive expression of the 96kb Ptn gene, then also some the weaker promoters can drive RT advancement, suggesting that it is a combination of transcriptional strength and size of the transcribed domain important for RT changes. Using a DOX-inducible promoter, the authors show that this happens very fast (3-6h after transcription induction) and is reversible as removal of DOX leads to slower RT again. Finally, deleting the promoter of Ptn gene and driving cells into differentiation still advances RT, allowing the authors to conclude that "transcription can be sufficient but not necessary to advance replication timing."

      Major comments: Overall, this is a well designed study that includes all necessary controls to support the author's conclusions. I think it is a very interesting system that the authors developed. The weakness of the manuscript is that there is no mechanistic explanation how such RT changes are achieved on a molecular basis. But I'm confident that the system could be indeed used to further dissect the mechanistic basis for the transcription dependence of RT advancements.

      Therefore, I support publication of this manuscript if a few comments below can be addressed.

      1) Figure 4 shows a titration of different DOX concentrations and provides clear evidence that the degree of RT advancement tracks well with the level of transcription. As the doses of DOX are quite high in this experiment, have the authors checked on a global scale to what extent transcription might be deregulated in neighbouring genes or genome-wide?

      The DOX concentration that we use for all experiments other than the titration is 2 µg/ml, which is quite standard. The high concentrations (up to 16µg/ml) are only used in the titration experiments shown in Figure 4 to demonstrate that we have reached a plateau. In fact, we stated in Materials and Methods that high doses of Dox led to cell toxicity. Looking at the transcription datasets, there are no significant changes in transcription below 8µg/ml, a few dozen significant changes at 8 and more such changes at 16µg/ml of DOX. The tables of genome wide RT and transcription are provided in the manuscript for anyone wishing to investigate the effects of Dox on cellular physiology but at the concentration used in all other experiments (2µg/ml) there are no effects on transcription.

      __ACTION DONE: We have now modified the statement in the Materials and Methods to read: “ ____Mild toxicity and changes in genome-wide transcription were observed at 8µg/ml and more so at 16µg/ml”. __

      2) One general aspect is that the whole study is only focused on the one single Ptn replication domain. Could the authors extend this rather narrow view a bit and also show RT data in the neighbouring domains. This would be particularly important for the DOX titration experiment that has the potential to induce transcriptional deregulation (see comment above).

      __ACTION DONE: We have now added to revised Supplemental Figure 4 a zoom out of 10 Mb surrounding the Ptn gene showing no detectable effects on RT at any of the titration concentrations. __

      __ACTION TO BE DONE SOON: To address the generalization of the findings (length and strength matter), we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will provide a new Figure 7 comparing genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances. __

      3) Figure 5 shows that the full capacity to advance RT upon DOX induction of the Ptn gene is achieved after 3h to 6h of DOX induction, so substantially less than a full cell cycle in mEScs (12h). This result suggests that origin licensing/MCM loading cannot be the critical mechanism to drive the RT change because only a small fraction of the cells has undergone M/G1-phase where origins are starting to get loaded. As a large fraction of mESCs (60-70%) are S-phase cells in an asynchronous population, the mechanism is likely taking place directly in S-phase. Could the authors try to synchronize cells in G1/S using double-thymidine block, then induce DOX for 3h before allowing cells to reenter S-phase and then check replication timing of the domain? This can be compared to an alternative experiment where transcription is only induced for 3h upon release into S-phase. This could provide more mechanistic insights as to whether transcription is sufficient to drive RT changes in G1 versus S-phase cells.

      We agree that the timing of induction is such that it is very likely that alterations in RT can occur during S phase. The reviewer proposes a reasonable experiment that could be done, but it would require a long delay of this publication to develop and validate those synchronization protocols and we do not have personnel at this time to carry out the experiment. This would be a great initiating experiment for someone to pursue the mechanisms by which transcription can advance RT.

      ACTION DONE: We have added the following sentence to the Discussion section on mechanisms: ____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.

      Minor comments: • Figure 1B and Figure 6A. Quality of the genome browser snapshots could be improved and certain cryptic labelling such as "only Basic displayed by default" could be removed

      ACTION DONE: We have modified these figures.

      • The genome browser tracks appear a bit small across the figures and could be visually improved.

      ACTION DONE: We have modified the genome browser tracks to improve their presentation

      • In figure 1E we see an advancement in RT in Ptn gene caused by nearby enhanced Hyg-TK gene expression induced by mPGK promoter. However, in figure 3D we see mPGK promoter has reduced ability to advance RT of Ptn gene. It would be nice to address this discrepancy in the results.

      The reviewer’s point is well taken. We are not sure of the answer. You can see that the transcription is very low in both cases, while the RT shift is greater in one replicate vs. the other.

      ACTION DONE: We have, rather unsatisfactorily, added the following sentence to the results section describing Figure 3. “____We do not know why the mPGK promoter was so poor at driving transcription in this context.”

      Reviewer #2 (Significance (Required)):

      In my point of view, this is an important study that unifies a large amount of literature into a conceptual framework that will be interesting to a broad audience working on the intertwined fields of gene regulation, transcription and DNA replication, as well as cell fate switching and development.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ In their manuscript, "Transcription can be sufficient, but is not necessary, to advance replication timing," Vouzas et al. take a systematic and reductionist approach to investigate a late-replicating domain on chromosome VI. Here, they examine the effect of transcribing a single gene locus, Pleiotrophin, on replication timing. When inserting or manipulating promoters or transcript lengths using CRISPR-Cas9, replication timing was altered in mESCs as judged by a combination of Repli-Seq, Bru-Seq, and RNA-Seq. Importantly, they found that transcription can be sufficient to advance replication timing depending on the length and strength of the expression of an ectopically transcribed gene. Taken together, the manuscript presents a compelling argument that transcription can advance replication timing but is not necessary for it.

      Major comments • A schematic or conceptual model summarising the major findings of transcription-dependent and independent mechanisms of RT advancement should be included in the discussion to add to the conceptual framework

      NOT DONE: We discussed this at length between the two senior authors and the first author and we do not feel ready to draw a summary model. We do not know what is advancing RT when transcription is induced or not induced, and we are not comfortable choosing one possible model of many. We hope that the added speculations on mechanism in the Discussion will sufficiently convey the future research that we feel needs to be done.

      ACTIONS DONE: In addition to the speculation on mechanism that already was in our Discussion section, we have added: On mechanisms of rapid induction of RT change, we have added to the Discussion: “____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.” And “For example, long transcripts could remodel a large segment of chromatin, possibly by creating domains of DNA supercoiling (Naughton et. al., 2013, PMID ____23416946).____ “ On mechanisms of RT advance in the absence of transcription, we have added the following to the Discussion: “____This is not surprising, given that ERCEs can maintain early RT in the absence of transcription (Turner, bioRxiv). In fact, chromatin features with the properties of ERCEs do appear outside of the deleted Ptn transcription control elements after differentiation (soon to be revised Figure 6C). The necessity and sufficiency of these new chromatin features to advance RT independent of transcription will be important to follow up.”

      • Vouzas et al. spend a substantial part of the manuscript to delve into the requirements to advance RT and even use a Doxycycline-based titration for temporal advancement of RT. Yet, all conclusions come from the use of hybrid-genome mouse embryonic stem cells (mESCs). Therefore, it remains speculative if and whether findings can be generalized to other cell types or organisms. The authors could include another organism/ cell type to strengthen the relevance of their findings to a broader audience, particular as they identified promoters that drive ectopic gene expression without affecting RT. Showcasing this in other model organisms would be of great interest.

      NOT DONE: To set this system up in another cell type or species would take a very long time. We also do not have personnel to carry that approach.

      ACTION TO BE DONE SOON: As an alternative approach that partially addresses this reviewer’s concern, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells. As described above in response to Revier #2s criticism #2, we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will compare genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances.

      • OPTIONAL: as with the previous point, the authors went to great depth and length to show how ectopic manipulations affect RT changes on a single locus using genome-wide methods. In addition, the manuscript would benefit from the inclusion of other loci, particularly as transcription of the Ptn locus wasn't needed during differentiation to advance RT at all.

      NOT DONE: This rigorous reductionist approach is laborious and to set it up at one gene at a time at additional loci would be a huge effort taking quite a long time.

      ACTION TO BE DONE SOON: (same as response above) As an alternative approach that partially addresses this reviewer’s concern, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells. As described above in response to Reviewer #2s criticism #2, we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will compare genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances.

      • The same point of Ptn not needing to be transcribed to advance RT of the respective domain, albeit being a very interesting observation, disturbs the flow of the manuscript, as the whole case was built around transcription and this particular locus-containing domain. Maybe one can adapt the storytelling to fit better within the overall framework.

      We would argue that demonstrating induction of Ptn, the only gene in this domain, is sufficient to induce early RT is a logical segway to asking whether, in the natural situation, induction is correlated with advance in RT. Our results show that transcription is sufficient but not necessary, which is expected if there are other mechanisms that regulate RT.

      __ACTION DONE: To make this transition more smooth, we have added the following sentence to the beginning of the results section describing Figure 6: “ ____This raises the question as to whether the natural RT advance that accompanies Ptn induction during differentiation requires Ptn transcription, or whether other mechanisms, such as ERCEs (Sima / Turner) can advance RT independent of transcription. “ __

      ACTION TO BE DONE SOON:____ To finish the work flow in a way that ties length and strength and sufficiency but not necessity in to the theme of natural cellular differentiation, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells, as described above.

      Minor comments • While citations are thorough, some references (e.g., "need to add Wang, Klein, Mol. Cell 2021") are incomplete.

      __ACTION TO BE DONE SOON: We apologize that some references seemed to not be incorporated into the reference manager Mendely. Since we are still planning to add one more figure soon and we will need to add some references for the datasets that will be shown in future Figure 6D, after that draft is ready, we will comb the manuscript for any references that were not entered and correct them. __

      • The text corresponding to Figure 1C could use more explanation for readers not familiar with the depiction of Repli-Seq data.

      ACTION DONE: “____Repli-seq labels nascent DNA with BrdU, followed by flow cytometry to purify cells in early vs. late S phase based on their DNA content, then BrdU-substituted DNA from each of these fractions is immunoprecipitated, sequenced and expressed as a log2 ration of early to late synthesized DNA (log2E/L). BrU-seq labels total nascent RNA, which is then immunoprecipitated an expressed as reads per million per kilobase (RPMK).”

      • Figure 1C needs labelling of the x-axes.

      ACTION DONE: We have now labeled the X axes.

      • Statistical analyses should be used consistently throughout the manuscript and explained in more detail, i.e. significance levels, tests, instead of "Significant differences....calculated using x".

      We used the same analysis for all the Repliseq data and the same analysis for all the Bruseq data. We agree that we did not present this consistently in the figure legends and methods.

      ACTION DONE:____ To correct the confusion we have clarified the statistical methods in the methods section and referred to methods in the figure legends as follows:

      The methods description of statistical significance for RT now reads: “____Statistical significance of RT changes for all windows in each sample, relative to WT, were calculated using RepliPrint (Ryba et al., 2011), with a p-value of 0.01 used as the cut-off for windows with statistically significant differences.”

      The methods description of statistical significance for transcription now reads: “____Differential expression analysis, including the calculation of statistically significant differences in expression, was conducted using the R package DESeq2____. In Figure 1, statistical significance was calculated relative to HTK expression in the parental cell line, which is expected to be zero, since the parental line does not have an HTK insertion. In all other Figures significance was calculated relative to Ptn expression in the parental line, which is expected to be zero, since the parental line does not express Ptn.____”

      The legend to Figure 1C now reads: The red shading indicates 50kb windows with statistically significant differences in RT between WT casteneus and modified 129 alleles, determined as described in Methods.

      The legend to Figure 1E now reads: “The asterisks indicate a significant difference in the levels of HTK expression relative to HTK expression in the parental cell line as described in Methods. ____There are no asterisks for the RT data, as statistical significance was calculated for individual 50kb windows as shown in panel (C).”

      Each time significance is measured in the subsequent legends, it is followed by the phrase “, determined as described in Methods” or “presented as in Figure 1C” or “presented as in Figure 1E” as appropriate.

      __ __ **Referees cross-commenting** __ Comment on Reviewer#1's review__, comment mentioning ATAC-Seq: Another way to look at this could be to investigate for origin usage changes (BrdU-Seq or GLOE-Seq) of chromosome 6 during differentiation.

      NOT DONE: Unfortunately we could not find any studies comparing origin mapping in mESCs and mNPCs.

      Comment on Reviewer#2's review, major comment 3: I do agree with their statement that origin loading cannot be the driver of RT change, as MCM2-7 double hexamer loading is strictly uncoupled from origin firing. Hence, any mechanism responsible for RT advance must happen at the G1/S phase transition or during S-phase, most likely due to the regulated activity of DDK/CDK or the limitation and preferred recruitment of firing factors to early origins. This could be tested through overexpression of said factors.

      NOT DONE: We agree that manipulating these factors would be a reasonable next approach to sort out mechanism. Due to limited resources and personnel, we will not be able to do this in a short period of time. We also argue that these are experiments for the next chapter of the story, likely requiring an entire PhD thesis (or multiple) to sort out.

      ACTION DONE: We have added the following sentence to the Discussion section on mechanisms: ____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.

      Reviewer #3 (Significance (Required)):

      General: This manuscript presents a compelling study investigating the relationship between transcription and replication timing (RT) using a reductionist approach. The authors systematically manipulated transcriptional activity at the Ptn locus to dissect the elements of transcription that influence RT. The study's strengths lie in its rigorous experimental design, clear results, and the reconciliation of seemingly contradictory findings in the existing literature. However, some aspects could be improved, particularly in exploring the mechanistic details of transcription-independent RT regulation at the investigated domain, the generalisability of the findings to other cells/organisms, and enhancing the presentation of certain data (explanation of e.g. Figure 1c, dense figure arrangement, lack of a summary figure illustrating key findings (e.g., correlation between transcription rate, readthrough effects, and RT advancement)).

      Advance: The manuscript directly addresses and reconciles contradictory findings in the literature regarding the effect of ectopic transcription on RT. Previous studies have reported varying effects, with some showing that transcription advances RT (Brueckner et al., 2020; Therizols et al., 2014), while others have shown no effect or only partial effects depending on the insertion site (Gilbert & Cohen, 1990; Goren et al., 2008). The current study conceptually advances the field by systematically testing different promoters and transcript lengths at a single locus (mechanistic insight), demonstrating that the length and strength of transcription, as well as promoter context, influence RT. This presents a unifying concept on how RT can be influenced. The authors also present a tunable system (technical advance) that allows rapid and reversible alterations of RT, which will certainly be useful for future studies and the field.

      Audience: The primary audience will be specialised researchers in the fields of replication timing, epigenetics, and gene regulation. This study may be of interest beyond the specific field of replication timing, such as cancer biology, developmental biology, particularly if a more broader applicability of its tools and concepts can be shown.

      Expertise: origin licensing, origin activation, MCM2-7, yeast and human cell lines

    1. Another popular technique is called Wizard of Oz prototyping1,21 Hoysniemi, J., Hamalainen, P., and Turkki, L. (2004). Wizard of Oz prototyping of computer vision based action games for children. Conference on Interaction Design and Children (IDC). 2 Hudson, S., Fogarty, J., Atkeson, C., Avrahami, D., Forlizzi, J., Kiesler, S., Lee, J. and Yang, J. (2003). Predicting human interruptibility with sensors: a Wizard of Oz feasibility study. ACM SIGCHI Conference on Human Factors in Computing (CHI). . This technique is useful when you’re trying to prototype some complex, intelligent functionality that does not yet exist or would be time consuming to create, and use a human mind to replicate it. For example, imagine prototyping a driverless car without driverless car technology: you might have a user sit in the passenger seat with a couple of designers in the back seat, while one of the designers in the back seat secretly drives the car by wire. In this case, the designer is the “wizard”, secretly operating the vehicle while creating the illusion of a self-driving car. Wizard of Oz prototypes are not always the best fidelity, because it may be hard for a person to pretend to act like a computer might. For example, here’s Kramer, from the sitcom Seinfeld, struggling to simulate a computer-based voice assistant for getting movie times:

      This is an intriguing technique that I have never heard of before. There are many ideas we can come up with but we might not have to the resources that we need to implement those ideas so the Wizard of Oz technique can be extremely helpful. I think it will become more and more useful as designers try to create designs to get ahead. I feel like since technology is developing at such a fast rate, designers are looking for unique things that they can create that have never been done before, and it will require functionality that may not exist yet.

    1. Elizabeth R. Gordon Interviewed by Lilia Bierman TranscriptElizabeth R. Gordon Interviewed by Lilia Bierman00:00:00:00 - 00:00:37:24LILIA: Okay. I'm recording. ERG: Okay. As I'm scratching my head. Please edit that out. (Laughs)LILIA: (laughs) I will. Okay, our topic is on the transition from VCR, VHS, and DVD rentals to online streaming. The first question is, how old were you when VCR, VHS, and DVD became a thing, and later, when digital became a big thing? 00:00:37:24 - 00:01:05:21ERGSo, VCR, I was 14. Okay. DVD, I think, is probably like college. So maybe 21, 22. So that would have been like in 1993, but they still weren't affordable. Yeah. And then streaming. We probably didn't start streaming anything till about five years ago. I was in my late forties. 00:01:05:21 - 00:01:31:15LILIAOkay. What was your experience adapting to the transition to digital away from VHS, DVD, and VCR? And what did you think about these social changes?00:01:32:15 - 00:01:58:12ERGLike, when you have DVDs, when they get scratched, you would have to deal with that. And that was problematic. A lot of my videos are still on videotape. So my wedding is on tape. Oh my son, all his first moments are also on videotape.So I've got to get those transitioned—and then streaming and digital stuff. I mean like I said, because I came in the generation where we did not have personal computers in college. Everything has had to be self-taught. Luckily, my husband is very good about this, and he helps me out. But now I feel very confident in streaming and doing things like that and having apps on my phone—stuff like that.00:01:58:28 - 00:02:19:10Unknown(LILIA) Okay. (ERG) And then what was the second part of that. (Lilia) And what did you think about these social changes. (ERG) What do you mean by that. (LILIA)I mean it's just like how it it kind of ties into the next question, how it kind of changed your everyday lifestyle, if at all. If you noticed any changes, was it more difficult to adapt to.00:02:19:12 - 00:02:36:24ERGI mean, you made it easier because you didn't have to carry all this technology around. You have this I can stream Netflix on my phone now. And you don't have to keep up with X, Y and Z. It, I thought it made it very, it made it much easier and I definitely would not want to go backwards.00:02:38:18 - 00:03:09:11ERGBut I like my parents who are in their 80s. There's no way that they, they like the idea of probably have a Netflix or Amazon Prime, but there's no way that my dad could handle that. Yeah. He has a smartphone that, you know, it's, tech support. Yeah. Smartphone. LILIA Yep. I get it. Were there any challenges that you or others that you know, faced while adapting to these new technologies, whether it was learning it or just kind of want to throw your computer at the wall?00:03:09:16 - 00:03:30:01ERGYou know, because we didn't have any computer classes in high school. Yeah. I think they had one section. But the computers that we had or what we did, especially when I was in college, like I wanted C plus programing, I had never it was never taught like word processing Microsoft Word I learned how to type on a typewriter.00:03:30:22 - 00:03:51:21ERGSo again everything was self-taught. It was very hard to begin with and made me kind of nervous. I know a lot of people, think that they can mess something up and can't get it back, and, and there was a lot of anxiety, with that transition. But I feel, you know, again, like, I don't know everything.00:03:51:23 - 00:04:11:10ERGAnd I have children that can help me out, but, you know, I've had to learn a lot. My generation has had to learn a lot. Yeah. And most of us have adapted well, I think. Yes. I'm in Gen X, so that's 1965 to about 1980. And and we've learned a lot and adapted. You know. Yeah. The generation before us.00:04:11:12 - 00:04:38:29ERGNo they're not going to do that. No they're not. In retrospect what were the pros and cons of these shifts in technology. You can get more data on things. So I remember when I was writing my thesis in graduate school, and I was still we we didn't have a lot of memory on computers and had to save it on disks, and it took like 6 or 7 deaths and it would be awful.00:04:38:29 - 00:05:01:07ERGAnd then I'd have to get another. So that was extremely frustrating. You know, being able to have things that are quicker and easier to access and knowing that I've got more space and understanding what a megabyte is, what a gigabyte is, and the storage, that is a lot, lot more helpful. But again, I, I, I've enjoyed the technology push.00:05:01:07 - 00:05:26:12ERGThe one thing I don't like about it is that, I'm glad that I raised my children before this. Because I think that kids that are now being raised, a lot of them, you know, this is, this is shoved in their direction in order to occupy them and they're missing out on reading books. They're missing out on dealing with time that you just have to entertain yourself.00:05:26:12 - 00:05:42:26ERGLike going to the doctor's office. We always read books, or we always did stories, or we always just talked about our day. And now I see, you know, like a two year old or one year old, the doctor's office and the parent says this. Yep, yep. And that is just. And then again, you know, my students, I say it's constant.00:05:42:28 - 00:06:08:01ERGYeah. They can't cut it all. No. Like you got to be professional and put it aside and make eye contact. So it's all like that. Yeah. No, I totally agree. Looking back, what are the biggest lasting impacts of this shift? I just like the fact that you have more information that's accessible. You do have to decipher what is true and what's not true.00:06:08:02 - 00:06:29:26ERG Yeah, but, you know, if I have a question, instead of having to go to a library and find the book or and I would have I mean, I've taken graduate classes since the shift and my papers, I can find so much more information to write about. Because it's more accessible than half in a way on interlibrary loan or going over there and looking something up.00:06:29:28 - 00:06:54:27ERGSo I do like that quick access to information. I do like the portability of it. And I think that has really changed. And then I mean, things like exposure, like medical records. And when I make a doctor's appointment, the reminder will shift through my cell phone, or I'll shift through the app and then I can find out my test, my blood test for that rather quickly, and have to rely on somebody to call me and tell.00:06:55:00 - 00:07:04:29LILIAYeah, I totally agree. So I love all that. Yeah, it is very helpful. How would you describe this shift in one word?00:07:05:24 - 00:07:10:15ERGOne word?00:07:11:18 - 00:07:35:04ERGI think it's exciting. Yeah, I think it really is. I mean, again, I've embraced it because I've been forced to embrace it as an educator. As a parent. So I've everything about I've like except for again that this is just steering people away from having relationships. Yeah. And learning how to deal with, you know just empty time.00:07:35:04 - 00:07:56:10ERGYou've, you've got to, I think, a lot of parents are missing out on that. They definitely are. LILIAYeah, I totally agree. Do you miss VCR, VHS or DVD? And if so, what aspects specifically do you miss?00:07:56:13 - 00:08:19:09ERGCan't miss it if it's never gone. And I still have all my children's Pixar stuff. We lived on it. They had portable DVD players that would hook into the car. Yeah. We had 13-hour (car) rides to go with it. LILIAI mean, you can't argue about that.00:08:19:15 - 00:08:40:27ERGNo, you cannot, but no, I don't miss this at all. You know, I need to get the one thing that I'm really concerned about, which is that I need to get all my son's videos transferred over, and I'm about to send them to somebody. Yeah. And then my wedding video. I need to get that transferred into something. So, no, I don't miss it.00:08:40:29 - 00:09:01:17ERGNo, I still have a bunch, and I still have a DVD player. We got rid of the VCR a couple of years ago. Oh, maybe we haven't. So I can't watch my wedding videos anymore. But now I don't miss this at all. Okay, well that's fair. I don't blame you, since it does, and there's nothing in your computer, so, like.00:09:01:23 - 00:09:37:29ERGNo, I can't know. And there used to be some laptops where you could plug in CD's. Yeah, I remember that. And then like, you know, in the cars when I was 16, you had just, you had a radio and then you had a tape. And then like if you're real fancy, you had a plug in DVD and you plug in a CD player, but like when you went over a bob it was and then came you know they installed and I think my car right now it's like a 2016 I think it has a cassette and a DVD player.00:09:38:12 - 00:09:54:03ERGMay not have the cassette probably then, but yeah, it's just and then all that trying to figure out your song that you want, I mean it's just so much easier. Yeah. Just to plug something in or auto-connect it. It's fantastic. LILIAYeah. Okay. Well, that was all of my questions.Steven Hawk Interviewed by Colby Hawk TranscriptDr. Steven Hawk Interviewed by Colby Hawk00:00:00:00 - 00:00:28:08 Steven: Okay. Go ahead. You can introduce yourself. Yes. My name is Doctor Steven Hawk and I am a licensed K through 12 English teacher. And I've been teaching in the public schools for eight years now. Colby:  Cool. So, about how old were you? When, you know, you grew up with the, you know, VHS, VCR and everything, what was it like with that being a big thing back in the day?  00:00:28:08 - 00:00:48:04 Colby: What was your experiences with everyday life and having it having this technology?  Steven: Yeah. From, from the age where I was able to really watch movies, I was watching VHS tapes. So, I had a very small collection of VHS tapes and pretty much just rewatched the same 2 or 3 movies again and again and again and again.  00:00:48:04 - 00:01:06:24 Steven: As my mom would tell you, she would say, I wore out Land Before Time on VHS and Home Alone. Those are my two movies that I pretty much would play ‘em rewind ‘em, play ‘em, rewind ‘em. So as a child, that was my experience was just VHS tapes. You could go to a blockbuster and rent a VHS tape at that point.  00:01:06:26 - 00:01:29:22 Steven: But you owned very few and you were able to rent very few. If you were able to rent, it was usually like once a week. So, you didn't watch a lot of movies. And when you did, hopefully it was something you really liked, and you just watched it again and again and again.  Colby: Cool. Yeah. And having the technology and everything and, you know, the, you know, VHS mainly for you.  00:01:29:24 - 00:01:53:16 Colby: what was it like transitioning, to this digital, you know, internet age when you have iPhones in your pocket, MacBooks and streaming and all of that?  Steven: Yeah. So, the, the, the chain for me, was we went from VHS to DVD probably when I was about 13 years old, around 13. We, we had DVDs and that was a big deal.  00:01:53:19 - 00:02:15:11 Steven: And then DVDs evolved into Blu rays. So, the quality of the DVD DVDs got better. I remember it was my sophomore year of high school when MP3's became a thing. So no longer do we have to carry Walkmans to listen to music, but which is like a DVD, right? we transitioned to MP3's, and so the digital age kind of came upon us.  00:02:15:15 - 00:02:42:09 Steven: It wasn't until I was probably 22 that I had my first iPhone. So growing up, you know, we didn't have internet for the most part of my life. We didn't have any kind of apps or streaming until I was in my probably early 20s. And so that was a huge change because of the amount of things that you could be, I guess, exposed to through streaming.  00:02:42:12 - 00:03:07:12 Steven: It went from having to have a physical copy of a movie or a disc for music to being able to just choose from a vast digital library of different genres and different artists, to then seek out things which isn't something you were able to do. No more than just going to blockbuster and looking through the shelves, could you really seek out different genres and different types of things.  00:03:07:12 - 00:03:29:03 Steven: So, it in a lot of ways it was very freeing because it introduced you to a lot of new things, and you were able to discover a lot of new, tastes, genres, artists, things like that. So, yeah, I would say I was probably about 22 when streaming really caught on in the United States.  00:03:29:05 - 00:03:49:05 Colby: Now, if when you were 22, when you were 22, you would have just gotten out of college. So when you were still at UTK, what was that like, you know, going, you know, if you wanted to go watch something with your friends or, you know, catch up on the newest whatever, what what was that experience like before you had access to all this?  00:03:49:06 - 00:04:11:11 Steven: Yeah. So it was still DVDs were still the thing. You know, when I was in college, we hadn't moved to streaming quite yet. We had the internet age where you were streaming games online with friends and multiplayer and stuff like that. But not really movies. Movies and TV were not mainstream stream. They were not streamed to the mainstream yet.  00:04:11:14 - 00:04:33:23 Steven: And so for me, it was still going to the movies, you know, my friends and I, we would go to the movie theater if there was a movie coming out. You knew the release date and you would you would set a date and a time to go see the movie with your friends physically at a theater. So it wasn't like we stayed in our dorms or apartments and were able to stream the newest movie or TV show.  00:04:33:25 - 00:05:03:12 Steven: So, for me, that was it was still kind of what you would consider an old school experience. I know I've told you Facebook came out in 2005 when I first went to college. And, you know, so social media and the evolution of all streaming from internet, computer platforms, to digital media, for movies, and games, and music, that all really, you know, came mainstream after my college experience. Not during.  00:05:03:15 - 00:05:25:03 Colby: Now, the one big thing I think, and most everybody knows about right is blockbuster.  Steven  Yeah.  Colby  So, can you tell me a little bit more about your experiences with blockbuster? You know, was there like a membership program? Was there like certain deals that they had? What was it like going into one of these stores and renting and picking out your favorite flicks?  00:05:25:05 - 00:05:51:07 Steven: Yeah. If there was a membership program, I'm not aware. As a small child, I don't remember if there was a membership program. But what I do remember, and I tell people often, it was always like Christmas morning for me. I loved blockbuster. I think everyone kind of had the same experience where it was 1 or 2 times a week that you might be fortunate enough to go to a blockbuster and get to rent a new movie that you had never seen.  00:05:51:10 - 00:06:09:23 Steven: It was usually a Friday night, and you've been going to school all week and you're just looking forward to Friday night, because that's the one time your parents get to take you to blockbuster and you walk in the store, and it was like toys R us. You have all these movies, and it was just the covers of the movies with a DVD behind it.  00:06:09:25 - 00:06:32:09 Steven: And if you wanted to watch that movie, you had to take the cover out of the way and see if the DVD was still left. And if there was no DVD, then someone had already rented that movie. And if there were enough left, then you got to take one home. But very often they'd already been rented, and so some, some nights you would go for a certain movie, a new release, and it wasn't there.  00:06:32:14 - 00:06:50:03 Steven: And you'd be a little bummed, but you would just go pick out another movie and you would be excited because you didn't get to watch movies, but maybe once or twice a week. like, at all. You didn't get to watch any more than 1 or 2 movies a week. And so, it was a big deal to watch a movie back then, and it was a lot of fun.  00:06:50:04 - 00:07:15:08 Steven: It was something you really look forward to for Monday. You look forward to getting to Friday and Saturday so you could watch a movie and, and so yeah. It was really special back then. And, it had its. Looking back, you could say it had its difficulties. Like I said, you know, the movie may not be there for you to rent, but we dealt with that disappointment really well, I think, and just say, hey, maybe it'll be back by tomorrow.  00:07:15:08 - 00:07:36:02 Steven: Maybe we could rent it on Saturday night. If not, maybe next week. That'll be the movie. So, you know, we didn't get mad about it. It was part of the deal when you went to blockbuster. So I feel like, you know, movies were so much more special back then because they were so much more rare, and they're not rare anymore.  00:07:36:05 - 00:07:56:08 Steven: And so, you know, I miss I miss blockbuster, I miss the excitement of going into the store and the excitement of seeing if the DVD is still there and the excitement of taking it home and watching it. In the VHSs, you had to be kind and rewind is what you had to do. You know, you rewound the tape for the next person to use it.  00:07:56:15 - 00:08:14:18 Steven: When DVDs came along, it was special because you no longer had to rewind the movie. You could just return the disc. So that was a big deal for us. And then of course, as it moved to streaming, you could watch whatever you wanted whenever, you know, whatever day of the week. You didn't have to worry about rewinding or anything.  00:08:14:18 - 00:08:37:21 Steven: So, it was definitely an evolution. But, for me, blockbuster was really special. And not just blockbuster, but, you know, even Redbox later and, you know, any form of renting a movie during the week was really special.  Colby: Yeah. And, you're talking about how, you know, now it's not as you know, it's not special. You know, it's not, you know, you have easy access to everything.  00:08:37:21 - 00:09:10:19 Colby: And, kind of on that note, like looking back at your experiences having, you know, dealt with DVDs, VHS, all this stuff, and then having Disney+ and Netflix, and, whatever, Hulu, whatever. You know, how has that changed, like your lifestyle or, you know, just society today and, and like what what would you say or like in some of the pros and cons with having this easy access through, you know, the internet or whatever, you know.  00:09:10:24 - 00:09:35:04 Steven: Yeah. Definitely, it's a double edged sword. To kind of go back to say, Netflix started as a DVD subscription process, and then that turned into a digital streaming process. I didn't jump into that process, probably for a couple of years into when Netflix became a digital subscription service. Netflix was the first one that I subscribed to.  00:09:35:06 - 00:09:54:08 Steven: It was fairly cheap, and I thought, hey, this seems pretty neat, and I gave it a try. And that was my first foray into the digital streaming world. And I enjoyed it. You know, my first experience was, or my first thought was this, this is nice. This is a lot better than having to, you know, get out of my house and drive to a store and it may or may not be there.  00:09:54:08 - 00:10:20:06 Steven: And so, there were some pros there. There were some benefits to that process. But I think as time went on, and this is a year's process, right? As more and more things started to become, digital based, streaming based platforms, news, TV, movies, eventually, taking you out of the theater, even, and just leaving you in your living room.  00:10:20:08 - 00:10:50:07 Steven: Then the layers with Covid. You know, people not getting out of their house. They marketed streaming really heavily during the Covid years, and the years to follow Covid, as something to keep you safe. So it was a marketing ploy to really get you to binge watch and stream. So like I said, it became over time, I believe more of a negative thing had a negative impact on my life because it's so addictive.   00:10:50:09 - 00:11:27:02 Steven: Right? That word binge is probably not a positively connotated word in any other setting. If you binge on food per se, that would not be good. But to binge on Netflix has been marketed as a culturally positive thing. It's something that's good to do. And while it may seem good and may seem fun, and you may find a show or, you know, a series of shows that have five, seasons, and you can watch all of them in a matter of two weeks, I’m not sure that that’s healthy.   00:11:27:10 - 00:11:53:13 Steven: And, in my own life, personally, I think, I think it has had a negative impact to be totally honest. It’s much easier after a hard day of work to go to my bedroom and shut the door away from my kids and silence the house and just consume right? To not give anymore, but to just consume, to binge.   00:11:53:15 - 00:12:16:00 Steven: And that's not good. And I know that that's not good. And so, I feel like now I'm having to self-police. I'm having to say this much is okay, but this much is dangerous. This is not good, not healthy. And so, there's it's a fine line. I'm not exactly sure where the line is now because it's all an evolving process.    00:12:16:02 - 00:12:54:07 Steven: But for me personally, I know it's taking time from my kids, taking time from me reading books and things that I used to do more of, perhaps taking time away from, you know, talking to my wife and communicating. Giving myself a pass when things have been difficult to just sit there and binge and to stream. So, while there have been good things, I think you are, you're probably, kind of like the genres of music. You’re able to discover more through streaming, things that you didn't know existed or things that you didn't know perhaps you were interested in.  00:12:54:10 - 00:13:20:01 Steven: But the negative effect, I think, perhaps outweighs the positive. And that's just my experience. I know some people would disagree.  Colby: Yeah, there's a lot of differing opinions on, streaming and everything. And I think, I mean, I don't even have time to binge these days anymore, which is probably a good thing.  Steven: Yeah, I think so.  Colby: So we talked, you know, you touched on, like, the society and the shift and changes.  00:13:20:01 - 00:13:51:08 Colby: That was very good. With online and all that. Were there any, I guess, you kind of talked about this maybe a little bit, but like any challenges that you or any others that you observed or faced with this challenge of going away from, you know, more analog, whatever, to digital?   Steven: Yeah. I mean, nothing, nothing dramatic or drastic, but I think the first challenge was, of course, going from DVD to streaming because we were in an in-between stage there for a while.  00:13:51:13 - 00:14:07:23 Steven: You had streaming apps out there, and you had Netflix and things that you could, you know, sign up for and partake of, but it's like you kind of had a toe in that world, but you were still stuck to DVDs and you rented from, you know, once blockbuster went out, it was Redbox or, you know, stuff like that.  00:14:07:23 - 00:14:30:20 Steven: And then when I went full into streaming, then, I guess the challenge is, you know, part of its financial, to be totally honest. You’re, you're paying for things regularly that you didn't used to pay for, you know. Monthly, you're paying at a minimum, People are probably paying for one streaming app. Lots of people are paying for five or more streaming apps.  00:14:30:22 - 00:14:57:01 Steven: So what used to be free through cable is now charged through apps. So that's been a struggle. Just a financial struggle is like, where's the line between what's an appropriate amount to spend on this form of entertainment and what's not? What’s healthy, what's not? I know this was not for me, but for for some elderly people, there was a huge problem trying to transition to the digital streaming apps.  00:14:57:01 - 00:15:19:13 Steven: And, you know, they they had their TVs that they liked, but they weren't smart TVs. So, you know, they had to figure that they needed a new TV and how to work a new remote and how to download apps and work apps. And that wasn't a problem for me. But I did deal and try to help a lot of elderly people through that transition process to understand how to stream content.  00:15:19:16 - 00:15:40:17 Steven: But for me, you know, like I said, it was just kind of a. It was a learning phase then followed by a self-policing phase of what's. What do I need and what do I not need? Because everyone who develops a streaming app tells you that you need it. And it's kind of hard to select the right service, you know? Do you go with Hulu?  00:15:40:17 - 00:15:59:22 Steven: Do you go with, you know, Comcast? Which one do you go with? There are just so many to choose from that I had to do my research before I landed on the one that I would pay for. Yeah.  Colby: So I think we've already talked about, like, looking back, what were the big impacts on that.  00:15:59:22 - 00:16:29:29 Colby: I think we already touched that. Steven:  Yeah.  Colby:  How would you describe that shift in one word? Or that shift or like actually three things. How do you describe the shift?  The time before the like the VHS DVDs, all that. And then the time now after this shift. Like three, I know upped it but three.  Steven: Yeah. I would say for the time past, nostalgic. Nostalgic is my word because I miss it.  00:16:30:01 - 00:16:51:15 Steven: It's it's something you didn't know that you would miss when it when when it went away. there was sadness when blockbuster went out of business, but there was also an acceptance that this is just the new way of things. And sometimes the more we get into the new way, the more I wish it could become the old way.  00:16:51:18 - 00:17:19:01 Steven: So nostalgic would be that one. For the transition, I would say exciting would be the word I would use for that. I can remember being the only, high schooler, on the way to a baseball team with a new iPod that streamed. Or not streamed but you know, had the MP3 downloaded music that I could just select from a playlist, while all my friends had a Walkman disc that would skip if, you know, they didn't hold it right.  00:17:19:01 - 00:17:47:03 Steven: And so for me, it was exciting. It was a new frontier. It was a new challenge to learn the technology of it. What was for for the, what was the last question for now? I would say the word is dangerous. For the reasons I've stated already, you know, the, mainly the social reasons. What is marketed to us is that we, again, should binge these things.  00:17:47:09 - 00:18:15:27 Steven: We need these things. We can't live without these things. There's a lot of clever marketing that goes into it, and a lot of people that are persuaded by that marketing, including me to some extent. Right. Because I stream. I do watch shows and a lot of it, a lot more than I used to. What used to be one movie a week has turned into ten movies a week. And  20 episodes a week. And that's dangerous.  00:18:15:28 - 00:18:38:02 Steven: It’s dangerous because it's taking me from things that are more important. And it's giving me a pass when I'm tired to say I don't have to struggle with difficult things. I can just. I deserve this. To just sit quietly in my room, away from my children, away from my wife, away from whomever, and reward myself. I think that's a dangerous notion.  00:18:38:04 - 00:18:50:15 Steven: So dangerous, I think, would be the word. Colby: Cool. Yeah. And then. Yeah my battery’s giving me the warning. I think I've got 1 or 2. One more question.  00:18:50:15 - 00:19:10:24 Colby: Okay, so that two part thing, I guess if you could give me one more comment, like do you miss it? You know, do you miss the VHS? You know, rewinding and you know, having, you know, all that the blockbuster and what do you. What, if anything, would you change today? And then what were your favorite, you know, tapes? Or your.  00:19:10:28 - 00:19:34:01 Steven: Yeah. Yeah. Yeah. So I mentioned earlier, my two favorites when I was young was Land Before Time. The original Land Before time. The first one. Petrie, Longneck, and all the, Sharptooth. That was, I've watched that on repeat, I think. And, and then later when I was a little older, it was, Home Alone, the original Home Alone with Macaulay Culkin. And I just thought that was hilarious.  00:19:34:04 - 00:19:53:05 Steven: It’s kind of slapstick humor, you know? And so those are the two that were my favorite. As far as, you know, do I miss it? Absolutely. I miss the way things were, because I think I missed the way I was, and my family was, and other people were. That's what I missed. It's not that I miss blockbuster itself.  00:19:53:07 - 00:20:21:08 Steven: I miss the type of world that we lived in when we still had a blockbuster. When movies were still special. I didn't say earlier, but you know, as a, as a ninth-grade high school teacher, when we, when I was young and we had a special movie day that was like the best day ever. And so, as a teacher, I thought, hey, when they've really worked hard, I'm going to give them a special movie day occasionally, because I love that when I was young. And I tried that.  00:20:21:11 - 00:20:45:07 Steven: And I've learned that you can't get these kids to focus on a movie anymore. They're so desensitized. They're so overstimulated. They won't even watch a movie anymore. They don't care about movies anymore. I miss how much people cared about movies. So, yeah, I miss it. It's not that I miss VHS again. It's just I miss the way people were.  00:20:45:10 - 00:21:03:00 Steven: And I don't think we can ever get that back. I think we're too far away from that. I don't think we get back to that. So as far as the second part, you know, what could, I what would I change if I could change something? What would I want to change I don't think I have the power to change.  00:21:03:02 - 00:21:23:03 Steven: I want families to sit together on a couch on a Friday night, like I did with a couple pizzas and a show and watch it together, and laugh together, and have time together like family should. That's what I want to happen. but I can't make that happen for other people. I can try to make it happen in my home.  00:21:23:05 - 00:21:47:25 Steven: And, and I've been trying to do that more, you know? I've been consciously trying to do that more in my own home. But I can't do it for other peoples. And so, what I'm seeing in our culture is a shift away from, from loving one another, from spending time, quality time together, and for giving ourselves, as parents, a pass for spending time with our kids.  00:21:47:25 - 00:22:08:07 Steven: And sometimes, even for parenting our kids. Because it's easier just to put them in front of an iPad or a TV screen and just let them watch a movie than it is to discipline, or to ask them how their day was, or to troubleshoot things in their lives, or to help them with their math homework.  00:22:08:09 - 00:22:28:24 Steven: It’s easier just to let them stream something. So I don't know how we fix that, Colby. That's that's something that I've thought about a lot lately. How do we, as a society, as a culture, get back to at least some part of what we used to be when blockbuster still existed? I don't know, I don't know the answer to that.  00:22:28:24 - 00:22:52:17 Steven: I think it's a. It’s a question that people have to challenge themselves with personally. They have to know who they are, what they've become, what they want to be, and then find a way to, to find that middle ground between what's enough streaming and what's too much streaming for themselves as parents, as adults, and also for their children.  00:22:52:19 - 00:23:00:15 Steven: And I just don't have a good answer to that, even though I wish I could. Colby:  Sweet. That was a very good answer.Paul Navis  Interviewed by Cole Kennedy Transcript

      Good job running the interviews as conversations rather than spitting the questions out, without any follow up questions! I also appreciate that the transcripts were cleaned up and made easier to navigate.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity)

      *This study examines the reorganization of the microtubule (MT) cytoskeleton during early neuronal development, specifically focusing on the establishment of axonal and dendritic polarity. Utilizing advanced microscopy techniques, the authors demonstrate that stable microtubules in early neurites initially exhibit a plus-end-out orientation, attributed to their connection with centrioles. Subsequently, these microtubules are released and undergo sliding, resulting in a mixed-polarity orientation in early neurites. Furthermore, the study elegantly illustrates the spatial segregation of microtubules in dendrites based on polarity and stability. The experiments are rigorously executed, and the microscopy data are presented with exceptional clarity. The following are my primary concerns that warrant further consideration by the authors. *

      1. Potential Bias in the MotorPAINT Assay: Kinesin-1 and kinesin-3 motors exhibit distinct preferences for post-translationally modified (PTM) microtubules. Given that kinesin-1 preferentially binds to acetylated microtubules over tyrosinated microtubules in the MotorPAINT assay, the potential for bias in the results arises. Have the authors explored the use of kinesin-3, which favors tyrosinated microtubules, to corroborate the observed microtubule polarity?

      We thank the reviewer for the careful assessment of our manuscript. As the reviewer noted, it has indeed been demonstrated that kinesin-1 prefers microtubules marked by acetylation (Cai et al., PLoS Biol 2009; Reed et al., Curr Biol 2006) and kinesin-3 prefers microtubules marked by tyrosination in cells (Guedes-Dias et al., Curr Biol 2019; Tas et al., Neuron 2017); however, these preferences are limited in vitro, as demonstrated for example in Sirajuddin et al. (Nat Cell Biol 2014). When motor-PAINT was introduced, it was verified that purified kinesin-1 moves over both acetylated and tyrosinated microtubules with no apparent preference in this assay (Tas et al., Neuron 2017). This could be due to the more in vitro-like nature of the motor-PAINT assay (e.g. some MAPs may be washed away) and/or because of the addition of Taxol during the gentle fixation step, which converts all microtubules into those preferred by kinesin-1. We will clarify this in the text.

      Planned revisions:

      • We will clarify the lack of kinesin-1 selectivity in motor-PAINT assays in the text by adding the following sentence in the main text when introducing motor-PAINT: Importantly, while kinesin-1 has been shown to selectively move on stable, highly-modified microtubules in cells (Cai et al., PLoS Biol 2009; Reed et al., Curr Biol 2006), this is not the case after motor-PAINT sample preparation (Tas et al., Neuron 2017).

      Axon-Like Neurites in Stage 2b Neurons: The observation of axon-like neurites in Stage 2b neurons, characterized by an (almost) uniformly plus-end-out microtubule organization, is noteworthy. Have the authors confirmed this polarity using end-binding (EB) protein tracking (e.g., EB1, EB3) in Stage 2b neurons? Do these neurites display distinct morphological features, such as variations in width? Furthermore, do they consistently differentiate into axons when tracked over time using live-cell EB imaging, rather than the MotorPAINT assay? Could stable microtubule anchoring impede free sliding in these neurites or restrict sliding into them? Investigating microtubule sliding dynamics in these axon-like neurites would provide valuable insights.

      We thank the reviewer for highlighting this finding. Early in development, cultured neurons are known to transiently polarize and have axon-like neurites that may or may not develop into the future axon (Burute et al., Sci Adv 2022; Schelski & Bradke, Sci Adv 2022; Jacobson et al., Neuron 2006). In the absence of certain molecular or physical factors (e.g. Burute et al., Sci Adv 2022; Randlett et al., Neuron 2011), this transient polarization is seemingly random and as such, we do not expect the axon-like neurites in stage 2b neurons to necessarily become the axon. Interestingly, anchoring stable microtubules in a specific neurite using cortically-anchored StableMARK (Burute et al., Sci Adv 2022) or stabilizing microtubules in a specific neurite using Taxol (Witte et al., JCB 2008) has been shown to promote axon formation, but these stable microtubules have slower turnover (perhaps necessitating the use of laser severing as in Yau et al., J Neurosci 2016) and may not always bear EB comets given that EB comets are less commonly seen at the ends of stable microtubules (Jansen et al., JCB 2023).

      Planned revision:

      • We will add additional details to the text to clarify the likely transient nature of this polarization in agreement with previous literature and specify that they are otherwise not morphologically distinct.
      • We will perform additional EB3 tracking experiments in Stage 2b neurons to examine potential differences between neurites.

      *Taxol and Microtubule Sliding: Taxol-induced microtubule stabilization is known to induce the formation of multiple axons. Does taxol treatment diminish microtubule sliding and prevent polarity reversal in minor neurites, thereby facilitating their development into axons? *

      We thank the reviewer for this interesting suggestion. Taxol converts all microtubules into stable microtubules. Given that the initial neurites tend to be of mixed polarity, having stable microtubules pointing the "wrong" way may impede sliding and polarity sorting. Alternatively, since it is precisely the stable microtubules that we see sliding between and within neurites using StableMARK, Taxol may also increase the fraction of microtubules undergoing sliding. Because of this, it is not straightforward to predict how Taxol affects microtubule (re-)orientation and sliding. Preliminary motor-PAINT experiments do suggest that the multiple axons induced by Taxol treatment all contain predominantly plus-end-out microtubules, as expected, and that this is the case from early in development. We will further develop these findings to include them in our manuscript.

      Planned revision:

      • We have already performed some experiments in which we treat neurons with 10 nM Taxol and verify that we observe the formation of multiple axons by motor-PAINT. We will perform additional experiments in which we add this low dose of Taxol to the cells and determine its effect on microtubule sliding dynamics.

      *Sorting of Minus-End-Out Microtubules (MTs) in Developing Axons: Traces of minus-end-out MTs are observed proximal to the soma in both Stage 2b axon-like neurites and Stage 3 developing axons (Figure S4). Does this indicate a clearance mechanism for misoriented MTs during development? If so, is this sorting mechanism specific to axons? Could dynein be involved? Pharmacological inhibition of dynein (e.g., ciliobrevin-D or dynarrestin) could assess whether blocking dynein disrupts uniform MT polarity and axon formation. *

      We indeed think that a clearance mechanism is involved for removing misoriented microtubules in the axon after axon specification. Many motor proteins have been implicated in the polarity sorting of microtubules in neurons and for axons, dynein is believed to play a role (Rao et al., Cell Rep 2017; del Castillo et al., eLife 2015; Schelski & Bradke, Sci Adv 2022). A few of these studies already employed ciliobrevin, noting that it increases the fraction of minus-end-out microtubules in axons (Rao et al., Cell Rep 2017) and reduces the rate of retrograde flow of microtubules in immature neurites (Schelski & Bradke, Sci Adv 2022). These findings are in line with the suggestion of the reviewer. Interestingly, however, as we highlight in the discussion, the motility we observe for polarity reversal is extremely slow on average (~60 nm/minute) because the microtubule end undergoes bursts of motility and periods in which it appears to be tethered and rather immobile. Given that most neurites are non-axon-like, we assume these sliding events are mostly not taking place in axons or axon-like neurites. These events may thus be orchestrated by other motor proteins (e.g. kinesin-1, kinesin-2, kinesin-5, kinesin-6, and kinesin-12) that have been implicated in microtubule polarity sorting in neurons. We do observe retrograde sliding of stable microtubules in these neurites at a median speed of ~150 nm/minute, which is again much slower than typical motor speeds and occurs in almost all neurites and not specifically in one or two axon-like neurites. It is thus unclear which motors may be involved, and it is difficult to predict how any drug treatments would affect microtubule polarity.

      Dissecting the mechanisms of microtubule sliding will require many more experiments and will first require the recruitment and training of a new PhD student or postdoc. Therefore, we feel this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      Planned revision:

      • We will expand our discussion of the potential mechanisms facilitating polarity sorting in axons and axon-like neurites in the discussion.

      Impact of Kinesin-1 Rigor Mutants on MT Polarity and Dynamics: Would the expression of kinesin-1 rigor mutants alter MT dynamics and polarity? Validation with alternative methods, such as microtubule photoconversion, would be beneficial.

      It is important to note that StableMARK and its effects on microtubule stability have been extensively verified in the paper in which it was introduced (Jansen et al., JCB 2023). At low expression levels (where StableMARK has a speckled distribution along microtubules), StableMARK does not alter the stability of microtubules (e.g., they are still disassembled in response to serum starvation), alter their post-translational modification status or their distribution in the cell, or impede the transport of cargoes along them. Given that we chose to image neurons with very low expression levels of StableMARK (as inferred by the speckled distribution along microtubules), we expect its effects on the microtubule cytoskeleton to be minimal.

      Planned revision:

      • We will clarify the potential effects of StableMARK in the manuscript. We will perform experiments with photoactivatable tubulin to examine whether we still see microtubules that live for over 2 hours. We will furthermore examine whether it allows us to see microtubule sliding between neurites similar to work performed in the Gelfand lab (Lu et al., Curr Biol 2013).

      *Molecular Motors Driving MT Sliding: Which specific motors drive MT sliding in the soma and neurites? If a motor drives minus-end-out MTs into neurites, it must be plus-end-directed. The discussion should clarify the polarity of the involved motors to strengthen the conclusions. *

      We thank the reviewer for highlighting this point and will improve our discussion to clarify the polarity of the involved motors.

      Planned revision:

      • We will expand our discussion of the motors potentially involved in sliding microtubules when revising the manuscript.

      Stability of Centriole-Derived Microtubules: Microtubules emanating from centrioles are typically young and dynamic. How do they acquire acetylation and stability at an early stage? Do centrioles exhibit active EB1/EB3 comets in Stage 1/2a neurons? If these microtubules are severed from centrioles, could knockdown of MT-severing proteins (e.g., Katanin, Spastin, Fidgetin) alter microtubule polarity during neuronal development? A brief discussion would be valuable.

      We thank the reviewer for raising these interesting questions and suggestions. As suggested, we will include a brief discussion of these issues. What is known about the properties of stable microtubules is limited, so it is currently unclear how they are made. For example, we do not know if they are converted from labile microtubules or nucleated by a distinct pathway. If they are nucleated by a distinct pathway, do these microtubules grow in a similar manner as labile microtubules and do they have EB comets at their plus-ends (given that EB compacts the lattice (Zhang et al., Cell 2015, PNAS 2018) and stable microtubules have an expanded lattice in cells (de Jager et al., JCB 2025))? If they are converted, does something first cap their plus-end to limit further growth (given that EB comets are rarely observed at the ends of stable microtubules (Jansen et al., JCB 2023))?

      We also do not know how the activity of the tubulin acetyltransferase αTAT1 is regulated. Is its access to the microtubule lumen regulated or is its enzymatic activity stimulated by some means (e.g., microtubule lattice conformation or a molecular factor)?

      We find the possibility that microtubule severing enzymes release these stable microtubules from the centrioles very exciting and hope to test the effects of their absence on microtubule polarity in the future. We will discuss this in the manuscript as suggested.

      Planned revision:

      • We will expand our discussion about the centriole-associated stable microtubules in the revised manuscript. Minor Points

      • In Movies 3 and 4, please use arrowheads or pseudo-coloring to highlight microtubules detaching from specific points. In Movie 5, please mark the stable microtubule that rotates within the neurite. These annotations would enhance clarity.

      Planned revision:

      • We will add arrowheads/traces to the movies to enhance clarity.* *

      The title states: 'Stable microtubules predominantly oriented minus-end-out in the minor neurites of Stage 2b and 3 neurons.' However, given that the minus-end-out percentage increases after nocodazole treatment but only reaches a median of 0.48, 'predominantly' may be an overstatement. Please consider rewording.

      We thank the reviewer for catching this mistake and will adjust the statement to better reflect the median value.

      Planned revision:

      • We will reword this statement in the revised text.

      *Please compare the StableMARK system with the K560Rigor-SunTag approach described by Tanenbaum et al. (2014). What are the advantages of StableMARK over the SunTag method? *

      While the SunTag is certainly a powerful tool to visualize molecules at low copy number, we believe that StableMARK is more appropriate than the K560Rigor-SunTag tool for our assays due to two main reasons. Firstly, K560Rigor-SunTag is based on the E236A kinesin-1 mutation, while StableMARK is based on the G234A mutation. These are both rigor mutations of kinesin-1 but behave differently; the E236A mutant is strongly bound to the microtubule in an ATP-like state (neck linker docked), while the G234A mutant is also strongly bound, but not in an ATP-like state (Rice et al., Nature 1999). This means that they may have different effects on or preferences of the microtubule lattice. Indeed, while StableMARK (G234A) has been shown to preferentially bind microtubules with an expanded lattice (Jansen et al., JCB 2023; de Jager et al., JCB 2025), this may not be the case for the E236A mutant. In support of this, it has been shown that, while nucleotide free kinesin-1 can expand the lattice of GDP-microtubules at high concentrations (>10% lattice occupancy) in vitro (Peet et al., Nat Nanotechnol 2018; Shima et al., JCB 2018), kinesin-1 in the ATP-bound state does not maintain this expanded lattice (Shima et al., JCB 2018). Thus, we expect the kinesin-1 rigor used by Tanenbaum et al. (Cell 2014) to not be specific for stable microtubules (with an expanded lattice) in cells. In addition, given the dense packing of microtubules in neurites (not well-established in developing neurites, but with an inter-microtubule distance of ~25 nm in axons and ~65 nm in dendrites (Chen et al., Nature 1992)), the very large size of the SunTag could be problematic. The K560Rigor-SunTag tool from Tanenbaum et al. (Cell 2014) is bound by up to 24 copies of GFP (each ~3 nm in size), meaning that it may obstruct or be obstructed by the dense microtubule network in neurites.

      Planned revision:

      • Given that, unlike the K560Rigor-SunTag construct, StableMARK has been carefully validated as a live-cell marker for stable microtubules, we believe that the above discussion goes beyond the scope of the manuscript.* *

      Microscopy data (Movies 2, 3, and 4) show microtubule bundling with StableMARK labeling, which is absent in tubulin immunostaining. Could this be an artifact of ectopic StableMARK expression? If so, a brief note addressing this potential effect would be beneficial.

      As with any overexpression, there is a risk of artifacts. We feel that in the cells presented, the risk of artifacts is limited because we have chosen neurons expressing StableMARK at very low levels. Prior work has demonstrated that in cells where StableMARK has a speckled appearance on microtubules, it has limited undesired effects on stable microtubules or the cargoes moving along them (Jansen et al., JCB 2023). Perhaps some of the apparent differences in the amount of bundling can be explained in that the expansion microscopy images shown may have less apparent bundling because of the improved z-resolution and thus optical sectioning. Any z-slice imaged using expansion microscopy will contain fewer microtubules, so bundling may be less obvious. If we compare the amount of bundling seen in StableMARK expressing cells with the amount of bundling of acetylated microtubules (a marker for stable microtubules) in DMSO/nocodazole treated (non-electroporated) cells imaged by confocal microscopy in Figure S7, we feel that the difference is not so large. Nonetheless, we can briefly address this potential effect in the text.

      Planned revision:

      • We will improve the transparency of the manuscript by briefly mentioning this in the text. Reviewer #1 (Significance)

      It is an important paper challenging established ideas of microtubule organization in neurons. It is important to the wide audience of cell and neurobiologists.__ __

      Reviewer #2 (Evidence, reproducibility and clarity)

      *The manuscript uses state-of-the-art microscopy (e,g. expansion microscopy, motorPAINT) to observe microtubule organization during early events of differentiation of cultured rat hippocampal neurons. The authors confirm previous work showing that microtubules in neurites and dendrites are of mixed polarity whereas they are of uniform plus-end-out polarity in axons. They show that stable microtubules (labeled with antibody against acetylated tubulin) are located in the central region of neurite cross-section across all differentiation stages. They show that acetylated microtubules are associated with centrioles early in differentiation but less so at later stages. And they show that stable microtubules can move from one neurite to another, presumably by microtubule sliding. *

      Comments

      1. *I found the manuscript difficult to read. There are lots of "segregations" of microtubules occurring over these stages of neuronal differentiation: segregation between the center of a neurite and the outer edge with respect to neurite cross-section, segregation between the region proximal to the cell body and the region distal to the cell body, and segregation over time (stages). The authors don't do a good job of distinguishing these and reporting the major findings in a way that is clear and straightforward. *

      We thank the reviewer for their feedback and will go over the text to make it easier to read. Within neurites, we use the word 'segregated' in the manuscript to mean that the microtubules form two spatially separate populations across the width of the neurites (i.e., their cross-section if viewed in 3D). Because of variability seen in the neurites of this stage, this segregation does not always present as a peripheral vs. central enrichment of the different populations of microtubules as we sometimes observed two side-by-side populations instead. We will make sure that we properly define this in the manuscript to avoid any confusion.

      When discussing other types of segregation, we tried to use different wording such as when discussing the proximal-distal distribution of microtubules with different orientations in axon-like neurites in this excerpt:

      Sometimes these axons and axon-like neurites had a small bundle of minus-end-out microtubules proximal to the soma (Figure S4). This suggests that plus-end-out uniformity emerges distally first in these neurites, perhaps by retrograde sliding of these minus-end-out microtubules (see Discussion).

      When discussing changes related to a particular stage, we instead aimed to list which stage we were talking about, such as seen in the discussion:

      Emerging neurites of early stage 2 neurons already contain microtubules of both orientations and these are typically segregated. These emerging neurites also contain segregated networks of acetylated (stable) and tyrosinated (labile) microtubules. In later stage 2, stage 3, and stage 4 neurons, stable (nocodazole-resistant) microtubules are oriented more minus-end-out compared to the total (untreated) population of microtubules; however, in early stage 2 neurons, stable microtubules are preferentially oriented plus-end-out, likely because their minus-ends are still anchored at the centrioles at this stage. The fraction of anchored stable microtubules decreases during development, while the appearance of short stumps of microtubules attached to the centrioles suggests that these microtubules may be released by severing.

      We appreciate the reviewer's concerns and will review the text carefully for clarity.

      Planned revision:

      • We will carefully go through the text when revising the manuscript to ensure that these distinctions are clear and consider using synonyms or other descriptors where they would enhance clarity.

      *The major focus is on microtubule changes between stages 2a and 2b. This is introduced in the text and in the methods but not reflected in Figure 1A which should serve as an orientation of what is to come. It would be helpful to move the information about stages to the main text and/or Figure 1A. *

      We thank the reviewer for pointing this out and will be more explicit about the distinction between stages 2a and 2b in the main text and make the suggested change to Figure 1A.

      Planned revision:

      • We will incorporate the suggested changes in the revised manuscript.

      For Figure 1, the conclusions are generally supported by the data with the exception of the data for stage 2b in 1D and 1H. The images in D and the line scan in H suggest that for stage 2b, minus-end-out are on one edge whereas the plus-end-out are on the other edge of the neurite cross-section. But this is only true for one region along this example neurite. If the white line in D was moved proximal or distal along the neurite, the line scan for stage 2b would look like those of stages 2a and 3.

      We thank the reviewer for noting this in the figure. For these earlier stages in neuronal development, the distribution of different types of microtubules within the neurite is more variable and does not always adhere to the central-peripheral distribution described for more mature neurons (Tas et al., Neuron 2017). We did not intend to suggest that neurites of stage 2b neurons consistently have a different radial distribution of microtubules of opposite orientation, but rather that microtubules of the same orientation tend to bundle together. Sometimes this bundling produces a central or peripheral enrichment, as described for mature neurons (Tas et al., Neuron 2017) and as seen in Figure 1D-F at certain points along the length of the neurites, and sometimes the bundling simply produces two side-by-side populations. To reflect this diversity, we chose two different examples in the figure. The line scans presented in Figure 1H were taken approximately at the midpoint of the presented ROIs. In addition, as our imaging in this case is two-dimensional, we do not want to make explicit claims about the radial distribution of the different populations of microtubules.

      Planned revision:

      • We will adjust our description of this figure in the main text to be more explicit about how we interpret these results. We will ensure that it is apparent that we do not think there is a specific radial distribution of microtubules depending on the developmental stage.

      *For Figure 2, I found it difficult to relate panels A-F to panels G-J. I recommend combining 2G-J with 3A-B for a separate figure focused on the orientation of stable microtubules across different stages. *

      We thank the reviewer for this suggestion and will take it into consideration when preparing the revised manuscript, making sure that our figure organization is well justified.

      For Figure 3, it is difficult to reconcile the traces with the corresponding images - that is, there are many acetylated microtubules in the top view image that appear to contact centrioles but are not in the tracing. Perhaps the tracings would more accurately reflect the localization of the acetylated microtubules in the top view images if a stack of images was shown rather than the max projections. Or if the authors were to stain for CAMSAPs to identify non-centrosomal microtubules. I find the data unconvincing but I do believe their conclusion because it is consistent with published data in the field. The data need to be quantified.

      We thank the reviewer for noting this. Importantly, the tracing was done on a three-dimensional stack of images, whereas we present maximum projections of a few slices in Figure 3C for easy visualization. Projection artifacts indeed make it look as though some additional microtubules are attached to the centrioles, whereas in the three-dimensional stacks it is apparent that they are not. We can include the z-stacks as supplementary material so that readers can also verify this themselves. We will additionally clarify that this is the case in the text related to Figure 3C.

      Planned revision:

      • We will better explain how the tracing was done in the methods section and make a brief note of the projection artifacts in the main text.
      • We will also include the z-stacks as supplementary data.

      *I have a major concern with the conclusions of Figure 4. Here the authors use StableMARK to argue that microtubules do not depolymerize in one neurite and then repolymerize in another neurite but rather can be moved (presumably by sliding) from one neurite to another. The problem is that StableMARK-decorated microtubules do not depolymerize. So yes, StableMARK-decorated microtubules can move from one neurite to another but that does not say anything about what normally happens to microtubules during neuronal differentiation. In addition, the text says that the focus on Figure 4 is on how microtubules change between stages 2a and 2b but data is only shown for stage 2b. *

      As noted by the reviewer, StableMARK can indeed hyperstabilize microtubules when over-expressed; however, it is important to note that this strongly depends on the level of overexpression of the marker. This is discussed in detail in the paper introducing StableMARK, where it is described that at low expression levels, StableMARK does not alter the stability of microtubules (i.e., StableMARK decorated microtubules can still depolymerize/disassemble and they are disassembled in response to serum starvation), alter their post-translational modification status or their distribution in the cell, or impede the transport of cargoes along them (Jansen et al. JCB 2023). Despite this, we agree that it is important to validate these findings in our experimental system (primary rat hippocampal neurons) and so we plan to perform experiments with photoactivatable tubulin to verify the long lifetime of stable microtubules and aim to also observe microtubule sliding (similar to assays performed in the Gelfand lab (Lu et al., Curr Biol 2013)) in the absence of StableMARK.

      Planned revision:

      • We will confirm our findings using photoactivatable tubulin. We hope to demonstrate the long lifetime of the microtubules in this case and observe the sliding of microtubules by another means.
      • We will also revise the text to better explain the potential impacts of StableMARK and that we chose the lowest expressing cells we could find so early after electroporation.

      *The data are largely descriptive and it is of course important to first describe things before one can dive into mechanism. But most of the findings confirm previous work and new findings are limited to showing that e.g. microtubule segregation appears earlier than previously observed. *

      Our study is the first to use Motor-PAINT to carefully map changes in microtubule orientations during neuronal development. Furthermore, it is the first to use the recently introduced live-cell marker for stable microtubules to directly demonstrate the active polarity reversal of stable microtubules during this process.

      Optional: It would be nice if the authors could investigate some potential mechanisms. For example, does knockdown or knockout of severing enzymes prevent the loss of centriolar microtubules shown in Figure 3? Does knockdown or knockout of kinesin-2 or EB1 prevent the reorientation of microtubules (Chen et al 2014)?

      We agree with the reviewer that these are exciting experiments to perform, and we hope to unravel the mechanisms underlying microtubule reorganization in future work. However, this will require many more experiments, as well as the recruitment and training of a new PhD student or postdoc, given that the first author has left the lab. Therefore, we feel that this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      *Overall, the methods are presented in such a way that they can be reproduced. One exception is in the motor paint sample prep section: is it three washes for 1 min each or three washes over 1 min? *

      We thank the reviewer for pointing out this mistake and will adjust this step in the methods section accordingly.

      Planned revision:

      • We will revise the methods section to read 'washed three times for 1 minute each'.

      *No statistical analysis is provided. The spread of the data in the violin plots is very large and it is difficult to ascertain how strongly one should make conclusions based on different data spreads between different conditions. *

      We thank the reviewer for noting this and will add statistical tests to the graphs showing the fraction of minus-end-out microtubules in different stages/conditions.

      Planned revision:

      • We will include statistical tests in the specified graphs.

      For Figure S5, the excluded data (axons and axon-like neurites) should also be shown.

      We thank the reviewer for this suggestion and will include this data.

      Planned revision:

      • We will adjust this supplemental figure to also include the specified data.

      *For the movies, it would be helpful to have the microtubule moving from one neurite to another identified in some way as it is difficult to tell what is going on. *

      We thank the reviewer for pointing this out.

      Planned revision:

      • We will trace the microtubule in this movie to enhance clarity.* * Reviewer #2 (Significance)

      A strength of the study is the state-of-the-art microscopy (e,g. expansion microscopy, motorPAINT) and its application to a classic experimental model (rat hippocampal neurons). The information will be useful to those interested in the details of neuronal differentiation. A limitation of the study is that it appears to mostly confirm previous findings in the field (microtubule segregation, loss of centriolar anchoring, microtubule sliding). The advance to the field is that the manuscript shows that these events occur earlier in differentiation that previously known.

      • *

      Reviewer #3 (Evidence, reproducibility and clarity)

      *The study by Iwanski and colleagues explores the establishment of the specific organisation of the neuronal microtubule cytoskeleton during neuronal differentiation. They use cultures of dissociated primary hippocampal rat neurons as a model system, and apply the optimised motor-PAINT technology, expansion microscopy/immunofluorescence and live cell imaging to investigate the polarity establishment and the distribution of differentially modified microtubules during early development. *

      They show that in young neurons microtubules are of mixed polarity, but at this stage already the stable (acetylated) microtubules are preferentially oriented plus-end-out, and are connected to the centrioles. In later stages, the stable microtubules are released from the centrioles and reverse their orientation by moving around inside the cell body and the neurites.

      *Overall, the conclusions are well supported by the presented data. The experiments are conducted thoroughly, the figures are clearly presented (for minor comments, see below) and the manuscript is well and clearly written. *

      Major comments

      1. What is the proportion of neurons with different types of neurites (axon-like, non-axon-like) in stage 2b? (middle paragraph page 5 and Fig 1E). Please provide a quantification. * How was the quantification in Fig 2B-D-F done? Why do the curves all start at 0? Please provide a scheme explaining these measurements. Furthermore, the data in Fig 2B do not reflect the statement "the segregation (...) was less evident" than in later stages (top of page 6): while it is less evident than in stage 2b, it is extremely similar to stage 3. Please revise accordingly.*

      We thank the reviewer for pointing out these important details. We will make the suggested changes in the text, adding the proportion of neurons with different types of neurites and adjusting statement mentioned.

      The radial intensity distributions were quantified as described in Katrukha et al. (eLife 2021). In the methods section, we describe the process in brief:

      To analyze the radial distribution of acetylated and tyrosinated microtubules in expanded neurites, deconvolved image stacks were processed using custom scripts in ImageJ (v1.54f) and MATLAB (R2024b) as described in detail elsewhere (Katrukha et al., 2021). Briefly, on maximum intensity projections (XY plane), we drew polylines of sufficient thickness (300 px) to segment out neurite portions 44 µm (10 µm when corrected for expansion factor) in length proximal to the cell soma. Using Selection > Straighten on the corresponding z-stacks generated straightened B-spline interpolated stacks of the neurite sections. These z-stacks were then resliced perpendicularly to the neurite axis (YZ-plane) to visualize the neurite cross-section. From this, we could semi-automatically find the boundary of the neurite in each slice using first a bounding rectangle that encompasses the neurite (per slice) and then a smooth closed spline (approximately oval). To build a radial intensity distribution from neurite border to center, closed spline contours were then shrunken pixel by pixel in each YZ-slice while measuring ROI area and integrated fluorescence intensity. From this, we could ascertain the average fluorescence intensity per contour iteration, allowing us to calculate a radial intensity distribution by calculating the radius corresponding to each area (assuming the neurite cross-section is circular).

      The curves thus all start at 0 because no intensity "fits" into a circle of radius 0 and then gradually increase because very few microtubules "fit" into circles with the smallest radii.

      Planned revision:

      • We will revise the text to include the suggested changes and add a brief statement to the methods section to explain why the curves start at 0.* *

      *It should be stressed in the text, that the modification-specific antibodies only detect modified microtubules. Thus, in figure 3, in the absence of total tubulin staining, it is possible that there are more microtubules than revealed with the anti-acetylated tubulin antibody. A possible explanation should be discussed. *

      We thank the reviewer for highlighting this point and will adjust the text accordingly.

      Planned revision:

      • We will clarify this in the revised text by adding the following sentence: In addition, given that we specifically stained for acetylated tubulin (a marker for stable microtubules), it is possible that other non-acetylated and thus perhaps dynamic microtubules are also associated with the centrioles.* *

      *OPTIONAL: As discussed in the manuscript's discussion, testing some of the proposed mechanisms regulating microtubule cytoskeleton architecture in development (motors, crosslinkers, severing enzymes) would significantly increase the impact of this study. Exploring these phenomena in a more complex system (3D culture, brain explants) closer to the intricate character of the brain than the 2D dissociated neurons would be a real game-changer. *

      We agree that sorting out the mechanisms driving microtubule reorganization would be very exciting. However, this will require many more experiments, as well as the recruitment and training of a new PhD student or postdoc, given that the first author has left the lab. Therefore, we feel this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      Minor comments

      1. *It could be useful to write on each panel whether the images were obtained with expansion or motor-PAINT technique: the rendering of the figures is very similar, and despite the different colour scheme can be confusing. *

      We thank the reviewer for pointing this out.

      Planned revision:

      • We will incorporate this suggestion when revising our manuscript.

      Reviewer #3 (Significance)

      This manuscript provides insights into the establishment of the microtubule cytoskeleton architecture specific to highly polarised neurons. The imaging techniques used, improved from the ones published before (motor-PAINT: Kapitein lab in 2017, U-ExM: Hamel/Guichard lab in 2019), yield beautiful and convincing data, marking an improvement compared to previous studies.

      *However, the novelty of some of the findings is relatively limited. Indeed, a mixed microtubule orientation in very young neurites has already been shown (Yau et al, 2016, co-authored by Kapitein), as has the separate distribution of acetylated and tyrosinated / stable and labile / plus-end-out and plus-end-in microtubules in dendrites (Tas, ..., Kapitein, 2017). *

      *On the other hand, observation of the live movement of microtubules with the resolution allowing to see single (stable) microtubules is new and important. It provides an exciting setup to explore the mechanisms of polarity reversal of microtubules in neuronal development and it is regrettable that these mechanisms have not been explored further. *

      *The association of (stable) microtubules with the centrioles is also a technically challenging analysis. Despite not being able to visualise all microtubules, but only acetylated ones, these data are novel and exciting. *

      *This work will be of interest for neuronal cell biologists, developmental neurobiologists. The impact would be larger if the mechanistic questions were addressed using these sophisticated methodologies. *

      *This reviewer's expertise is the regulation of the microtubule cytoskeleton and its impact on molecular, cellular and organism levels. *

      • *


    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to warmly thank all the reviewers for their helpful and fair comments which will increase the quality of our manuscript.

      We would like to inform the reviewers that changes have been made concerning the Figures numbers as follows :

      Figure number in old version

      Figure number in revised manuscript

      1B

      S1C

      S1C

      S1D

      1C

      S2A

      S1D

      S2B

      S1E

      S2C

      1D

      1B

      S2

      S3

      S3

      S4

      S4

      S5

      1. Description of the planned revision

      Reviewer #1

      Major comments 3) Upon food supplementation with 20E the authors could not measure a significant effect on systemic growth or midgut maturation (Fig. S3), whereas the dose of 20E they fed (20µg/ml) was already much higher than endogenous 20E level they measured in the midgut (Fig. 2B).

      We thank the reviewer #1 for this comment.

      Fig. S3 is now Fig. S4

      First, the concentration of 20µg/mL is the final concentration in the fly food and is different from the levels of 20HE we measured in the organs and in the haemolymph, due to the different cell absorption and degradation of the product.

      This concentration of 20µg/mL corresponds to a molar concentration of approximately 0.04mM which is less than the common concentration of 20HE used in the literature in the food (1mM).

      Tiffany V. Roach, Kari F. Lenhart; Mating-induced Ecdysone in the testis disrupts soma-germline contacts and stem cell cytokinesis. Development 1 June 2024; 151 (11): dev202542. doi: https://doi.org/10.1242/dev.202542

      Ahmed, S.M.H., Maldera, J.A., Krunic, D. et al. Fitness trade-offs incurred by ovary-to-gut steroid signalling in Drosophila. Nature 584, 415-419 (2020). https://doi.org/10.1038/s41586-020-2462-y

      The authors should consider to feed larvae with RH5849 (Dr. Ehrenstorfer), which is an insecticide functioning as an ecdysone agonist and was designed for high stability (Wing et al, 1988). RH5849 was already successfully fed to adult Drosophila to investigate the impact of Ecdysone signalling on the adult midgut (Neophytou et al, 2023; Zipper et al, 2025; Zipper et al, 2020) and elicits 20E response. Furthermore, uptake of RH5849 is not limited by the levels of EcI.

      We thank the reviewer #1 for this comment. We ordered that compound and the experiment should be performed in July since the sending date is expected in late June.

      8) The authors should include a discussion of how Ecdysone signalling in postmitotic EC is regulating midgut size, which may include recent data from Edgar and Reiff labs (Ahmed et al, 2020; Zipper et al., 2025; Zipper et al., 2020).

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Of course, if the editor allows us to bypass that limit, we would be delighted to cite and discuss these papers.

      9) There are several recent publications showing a role for gut microbiota in regulating oestrogen metabolism in humans, and implications in oestrogen-related diseases such as endometriosis (Baker et al, 2017; Xholli et al, 2023). More precisely bacteria including Lactobacilli strains produce gut microbial β-glucuronidase enzymes, which reactivate oestrogens (Ervin et al, 2019; Hu et al, 2023). As Drosophila ecdysone is the functional equivalent of mammalian oestrogens (Aranda & Pascual, 2001; Martinez et al, 1991; Oberdörster et al, 2001) these publications should be discussed by the authors.

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Also, the topics of these papers seem a little bit out of the scope of our manuscript which is focused on the microbiota impact on midgut growth.

      Reviewer #2 Minor Comments

      Figure S2: columns A and B are box plots, while columns C and D are columns with error bars. Presentation of quantitative data should be uniform and ideally as box plots throughout.

      The authors thank the reviewer #2 for this advice and the figure will be further revised.

      Fig. S2 is now Fig. S3


      __Reviewer #3 __

      Major comments:

      The study relies on loss-of-function experiments to manipulate ecdysone signaling; gain-of-function experiments would provide an informative complement. Does feeding ecdysone phenocopy Lp association in GF larvae? Would ecdysone feeding have an additive effect with Lp association? Given the pleiotropic effects of ecdysone on larval phenotypes, a more targeted approach could be used to overexpress transgenes to augment ecdysone signaling.

      We thank the reviewer #3 for this comment. This thought is shared with reviewer #1 and this experiment will be repeated with RH5849. The results are expected in July.

      Minor comments:

      1. For gut and carcass length analysis, the EcR-RNAi and shd-RNAi conditions look slightly smaller in both GF and Lp conditions. Is there a genetic background effect on larval size? It would be helpful to calculate the interaction score between genotype and microbiome status via a 2-way ANOVA with post hoc tests.

      The authors thank the reviewer #3 for this comment. We will further analyse statistically that differences.


      6) In Fig. 3 the authors added the values for numbers of biological replica within the graphs. In Fig. 4 M-P they added the values for number of technical replicas. They should apply adding these two types of values to all graphs and I would suggest to make the difference between biological replica 'n' and technical replica 'N' obvious in the figure.

      The authors thank the reviewer #3 for this comment. We will modify these numbers in the Figures and/or we will clarify these numbers in the legends to not overwrite the Figures.


      The scope of the bibliography seems limited in scope. As one example, Shin et al., 2011 seems quite relevant for this study.

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Of course, if the editor allows us to bypass that limit we would be delighted to cite and discuss this paper.

      • *

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      All changes are visible in red in the text of the revised manuscript.

      __Reviewer #1 __

      __Major remarks __

      1) In Fig.2 E - G there is a remarkable difference between controls in D compared to F and E compared to G. The difference between the controls in E and G is stronger than the shown significant difference of EcRRNAi to the control in E. How do the authors explain such a difference of the two (basically equal) controls and the high variance in control values shown in G?

      We thank the reviewer #1 for this comment. As mentioned in the material and methods, the controls are different due to the different RNAi construct. Thus, this can generate variability in such type of developmental experiment.

      Line 253: "UAS-EcRRNAi (BDSC 9327), UAS-dsmCherryRNAI (BDSC 35785), UAS-shadeRNAi (VDRC 108911), and respective RNAi control lines (KK60101)."

      Are the comparisons of control and EcRRNAi shown in D significantly different?

      As mentioned in the figure panel, the EcRRNAi GF and control GF are significantly different and this is discussed in the text as follows in Line 154: "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."


      4) Lines 167-169: the authors state that 'Size-matched Lp associated larvae, controlRNAi or EcRRNAi, show longer midguts than their relative GF condition (Fig. 3A, B)', but there are no significant statistics shown for this comparison in Fig. 3A, B.

      We thank the reviewer #1 for this comment and we agree that the sentence can be misleading. Thus, we reformulated it as : "Size-matched Lp-associated EcRRNAi larvae show longer midguts than their relative GF controls (Fig. 3A, B)."

      10) Fig. S4 is not mentioned at all in the manuscript.

      We thank the reviewer #1 for this comment and we added the reference to the supplementary Figure 4, now Figure S5 on Line 202 : "In the anterior part, the cells and nuclei are bigger in Lp-associated than GF animals (Fig. 4M-N, Fig.S5). For the posterior part, the cell area was significantly increased in Lp- monoassociated animals compared to GF cell while no change was shown for the nucleus area (Fig. 4O-P, Fig.S5)."

      Minor comments: • The authors are inconsistent in indicating their experimental groups. One example is Fig. S3: In A and B they write the GF groups non-italic, whereas the L.p. groups are written italic. In C - E they only partially write the L.p. groups italic. Furthermore, in A, C - E they write 'L.p.', whereas its written 'Lp' and missing the 'WJL' in B.

      We thank the reviewer #1 for this comment and we corrected that mistake in Fig. S3.

      Fig. S3 is now Fig. S4

      • Line 52: The last 'i' in 'Lactobacilli' is not italic.

      We thank the reviewer #1 for this comment and we corrected that mistake. • Line 122: Spelling error in 'Surpringsinly'

      We thank the reviewer #1 for this comment and we corrected that mistake. • Line 151: Spelling error in 'progenies'. Needs to read 'progeny'.

      We thank the reviewer #1 for this comment and we corrected that mistake. • Lines 231-235: Last part of the sentence is repetitive

      We thank the reviewer #1 for this comment and we corrected that mistake as "Our work paves the way to deciphering the signals delivered by the bacteria that are sensed at the host cellular level and to understand how this microbe-mediated Ecd-dependent midgut growth contributes to the Drosophila larval growth upon malnutrition."

      Reviewer #2 Minor Comments 1. Figure 1 is interesting but challenging to follow. The fonts are very small and challenging to read. Pink on blue background is particularly hard to read and doesn't seem necessary. As the entire manuscript follows from data in Figure 1, I would encourage the authors to revise it with a vie3w to making the results more accessible.

      The authors thank the reviewer #2 for this advice and the Figure 1 has been revised.

      Figure 4 is impressive and important for the overall manuscript. The authors should provide representative images to show how they measured cell area and nucleus area.

      The authors thank the reviewer #2.

      How cell area and nucleus area were measured is described in Figure S4. The reference to this supplementary Figure was missing in the initial manuscript and we deeply apologize for that.

      Reviewer #1 also pointed out that the reference of Figure S4 covering that point was missing in the text and we corrected that point.

      I struggled to follow this sentence (line 215): "Also, it will be interesting to test, beyond their shared growth phenotype, whether they respond differently at the mechanistical level to the presence of bacteria in the anterior compartment." I would encourage the authors to consider alternative formulations.

      The authors thank the reviewer #2 and revised that sentence as follows :

      "Also, it will be interesting to investigate whether the midgut comprises sub-populations of enterocytes that differ in their physiological functions. Indeed, these sub-populations could be differently distributed along the midgut and be localized on anterior and/or posterior parts. Thus, they could present varied responses to the presence of the bacteria."

      __Reviewer #3 __

      Major comments

      Figure 4 title is misleading. No manipulations of ecdysone signaling are performed to demonstrate whether scaling relationships across tissues differ depending on ecdysone. The same experiment should be performed using mex>EcR-RNAi larvae and/or mex>shd-RNAi larvae.

      We thank the reviewer #3 for this comment.

      We agree with the reviewer and the title has been changed as follows and mentioned in red in the manuscript : Midgut-specific adaptive growth promoted by Lp in Drosophila larvae.


      Minor comments:

      It is notable that mex>EcR-RNAi in germ-free larvae exacerbates developmental delay. A possible interpretation is that ecdysone signaling in the germ-free context promotes increased growth rate. Could the authors comment?

      We thank reviewer #3 for this comment.

      Since we described a local effect at the intestine level for Ecd it is unlikely but not totally excluded that intestinal Ecd promotes systemic growth.

      Our comments are here in the text :

      "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."

      Experimental variation is substantial between the control conditions of the EcR and Shd knockdown experiments; median control + Lp D50 in the EcR experiment is ~6 days whereas in the shade experiment it is ~9 days. Can the authors comment on this between-experiment variation, which seems substantial (similar to the effect size between control + Lp and control GF)?

      We thank reviewer #3 for this comment which was also highlighted by the reviewer #1 and we answered as follows :

      As mentioned in the material in methods, the controls are different due to the different RNAi construct. Thus, this can generate variability in such type of developmental experiment.

      Line 253: "UAS-EcRRNAi (BDSC 9327), UAS-dsmCherryRNAI (BDSC 35785), UAS-shadeRNAi (VDRC 108911), and respective RNAi control lines (KK60101)."

      As mentioned in the figure panel, the EcRRNAi GF and control GF are significantly different and this is discussed in the text as follows in Line 154: "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."

      The methods detail an ecdysone feeding protocol that I could not find used in the experiments. Please clarify.

      We thank reviewer #3 for this comment.

      We would like to highlight that this protocol is related to an experiment described in Fig. S3 (now Fig.S4) and that supplementary Figure was cited here in the text of the manuscript Line 179 as follows "While the systemic growth of animals is not affected by addition of 20E, a slight trend to faster midgut maturation of GF larvae is observed through the measurements of longer guts (Fig. S4)."

      Also, in supplementary data :

      Fig. S3 : Feeding larvae with 20E does not impact the gut growth.

      (A-B) Addition of 20E has no impact on larval developmental timing (DT) and their D50. From size-matched animals (C), Lp promotes intestinal growth compare to GF (D) but no significant difference is shown in the gut/carcass ratio (E). Animals receiving 20E are represented with color filled circles +Lp (blue), GF (black) and controls without 20E supplementation with empty circles.

      The manuscript would benefit from additional proofreading. The text contains spelling errors throughout. The in-text reference formatting is inconsistent. Figure legends could be improved to better describe the data.

      We thank reviewer #3 for this comment and following the different reviewers comments we improved the manuscript in that way.

      3. Description of analyses that authors prefer not to carry out

      Reviewer #1

      __Major remarks __ 2) The authors should consider investigating an EcIRNAi in addition to EcRRNAi. EcR functions as activator, but also as suppressor in the absence of Ecdysone and a EcRRNAi suppresses both functions of EcR. By knocking down EcI the authors would prevent uptake of Ecdysone and thus interfere only with the ligand-induced activating function of EcR.

      We thank reviewer #1 for this comment.

      This experiment has been performed using EcI RNAi but not shown here because in our hands the genetic tool was not efficient (RNA interference does not work effectively) and thus the experiment was not conclusive.

      No phenotype was observed in our study (see Figure attached). Also, the others Oatp family members were tested for their expression in midgut and were found close to null expression.

      5) Why are the authors comparing the carcass length of GF shade RNAi with L.p. control in Fig. 3 D?

      We thank reviewer #1 for this comment. For transparency of the results, these statistics were added. Because in these conditions GF larvae were difficult to rise at the same size than their relative Lp monoassociated. Hence, the carcass length was used to normalize the data.

      7) In Fig. S3C the authors compared L.p. WJL 20E with the GF EtOH control, where is the comparison to the corresponding L.p. WJL EtOH control? The L.p. WJL EtOH control is compared to GF 20E instead.

      We thank reviewer #1 for this comment that will help to clarify our experiment.

      Fig. S3 is now Fig. S4

      For the Fig. S4C, it is a larval size that allows to compare sizes in all conditions independently. That explains that statistics are shown between all conditions. To not overload the Figure the p values not different are not mentioned.

      Reviewer #2 Minor Comments 3. Figure S3 confuses me. It seems that addition of 20E to GF larvae leads to a significant reduction of larval size, and that mono-association with Lp also significantly shortens larval size. Data in Figure 4G suggest that Lp should not affect larval body length relative to GF larvae. Can the authors explain the apparent discrepancy?

      The authors thank the reviewer #2 for this question. Fig. S3 is now Fig. S4.

      This difference could be explained as follows :

      • The developmental experiment in Fig. S3B shows no difference between the two GF conditions. Thus, at the end of the is larval development, systemic growth is similar in both conditions.

      Because performed earlier during development, the larval size experiment shows higher variability in measurements of larval size. Moreover, less larvae are present in the GF 20E condition that could explained that difference.

      • We have previously shown that Lp mono-associated larvae grow faster than GF. Thus, to collect size-matched larvae on the same day, GF or Lp animals come from a different initial day of experiment. Due to biological variability, some differences in timing could be observed between GF and Lp animals.

      Reviewer #3

      Major comments

      1. The authors conclude that intestinal ecdysone signals are not required for Lp-promoted systemic growth. However, their data shows that circulating 20E titer increases in an Lp-dependent manner, and this circulating 20E presumably affects multiple tissues throughout the organism. Since EcR is broadly expressed, can the authors examine how EcR knockdown in other tissues influences systemic growth in Lp-associated larvae? Fat body-specific EcR knockdown seems particularly of interest here given the established relationship between fat body ecdysone signaling and growth (Delanoue et al., 2010). This additional analysis would help clarify whether ecdysone signaling in non-intestinal tissues mediates the Lp-associated growth phenotype.

      We thank reviewer #3 for this comment that will help to clarify our manuscript.

      We would like to emphasize that we never mention in this manuscript that intestinal ecdysone signals are not required for systemic growth. Nevertheless, we highlighted that it is required for Lp-related midgut growth and not rate limiting for Lp-promoted systemic growth:

      Line 179 : "While the systemic growth of animals is not affected by addition of 20E, a slight trend to faster midgut maturation of GF larvae is observed through the measurements of longer guts (Fig. S3). Thus, the intestinal Ecd signaling is required for the midgut growth effect mediated by Lp in a context of malnutrition."

      Line 227: "Specifically, intestinal Ecd signaling is not rate-limiting for Lp-mediated adaptive growth."

      While it will be very interesting to study the effects of Ecd modulation from Fat Body, we feel this is out of the scope of our manuscript that focused on the Lp-based intestinal growth.

      The experimental design compares larvae associated with live Lp versus germ-free larvae provided sterile PBS. Since Lp cells constitute a potential nutrient source for developing larvae, it's unclear whether gene expression differences arise from larvae digesting Lp cells as a nutrient source or from active, microbe-host signaling interactions. To conclusively address this ambiguity, the authors should perform RNA-seq on larvae inoculated with live versus heat-killed Lp. Alternatively, qPCR could be used to provide evidence for the extent to which changes in ecdysone-related gene expression specifically require live Lp.

      We thank reviewer #3 for this comment.

      We (the lab) previously showed that the systemic growth phenotype is supported by bacteria during development and that bacteria have to be alive to support optimal effects (Storelli et al 2018, PMID: 29290388; Consuegra et al 2020a, PMID: 32196485; Consuegra et al 2020b, PMID: 32563155). This topic of bacteria viability has also been directly addressed independently by colleagues and reported recently (da Silva Soares NF, PMID: 37488173). Hence, we did not design our RNAseq with inactivated bacteria. However, if the editor believes this is essential to provide qPCR results on Ecd-related gene expression in live vs inactivated bacteria associations, we shall provide them but at this stage we believe this notion is not core to our message.

      Shade is expressed in the larval midgut, however the larval fat body is thought to be a major site of 20E to 20HE conversion. Can the authors test how Shd knockdown in the fat body affects systemic growth in the Lp-associated condition?

      We thank reviewer #3 for this comment. Nevertheless, we think this is out of the scope of our manuscript that focused on the Lp-based intestinal growth.

      In the knockdown experiments, body size is not measured for larvae/pupae. Given that ecdysone signaling impacts pupal volume (Delanoue et al., 2010) and controls metamorphosis timing, D50 plots by pupal volume would be informative to give a rough estimate of growth rate. For example, do germ-free EcR-RNAi larvae, which develop slower, have an equivalent body size to germ-free control larvae?

      We thank the reviewer #3 for this comment.

      All experiments were done with size-matched larvae because the aim of this manuscript is to detail what is the impact of Lp on the relative midgut vs systemic growth. Hence, we are using animals of similar systemic size to study their midgut size and identify allometry changes (midgut/larval size ratios) at a similar developmental point, which is same larval systemic growth (here L3). Thus, we feel that focusing on growth rates and systemic sizes in different genetic conditions, while interesting in general, is out of the scope of the study since we focus our study on midgut/larval size allometry.


      __Minor comments __

      The number of pupae in the EcR-RNAi and shd-RNAi experiments (Fig 2D, F) differ. Were larval densities controlled during development?

      I could not find this mentioned in the methods, and it is an important control parameter as larval density impacts developmental growth. Presenting this data as % viability of a known number of larvae deposited in food would be preferable.

      We thank the reviewer #3 for this comment.

      As mentioned in the material and methods, 40 eggs from axenic animals were deposited on each tube. It is true that the final number of pupae is different and could come from differential viability of the genetic backgrounds used. It would be difficult to follow from the same tube the larval development because of the manipulation of gnotobiotics animals. Nevertheless, in all experiments more than 25% of initial eggs deposited in tubes emerged as adults.

    1. une définition non ambigüe de ce qu’est penser

      Pas forcément de « penser », mais « jouer » – Turing abandonne la première dans son texte (“The original question, ‘Can machines think!’ I believe to be too meaningless to deserve discussion.”, p. 442):

      “We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’” (p. 434).

      Ce qui compte en pratique (pas juste pour Turing: pour nous aussi aujourd’hui), c’est : est-ce que la machine peut faire ce qu’on veut qu’elle fasse (jouer, parler, écrire sans fautes, bref correspondre à nos attentes d’intelligence).

      May not machines carry out something which ought to be described as thinking but which is very different from what a man does? This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection.<br /> (p. 435)

    1. Reviewer #3 (Public review):

      Summary:

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.

      Strengths:

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.

      Weaknesses:

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.

      Major Concerns:

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:

      a) Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.

      c) What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deep-layer-like neurons in the pallium of birds.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.

      b) Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary:  

      This study provides new insights into the role of miR-19b, an oncogenic microRNA, in the developing chicken pallium. Dynamic expression pattern of miR-19b is associated with its role in regulating cell cycle progression in neural progenitor cells. Furthermore, miR-19b is involved in determining neuronal subtypes by regulating Fezf2 expression during pallial development. These findings suggest an important role for miR-19b in the coordinated spatio-temporal regulation of neural progenitor cell dynamics and its evolutionary conservation across vertebrate species.  

      Strengths:  

      The authors identified conserved roles of miR-19 in the regulation of neural progenitor maintenance between mouse and chick, and the latter is mediated by the repression of E2f8 and NeuroD1. Furthermore, the authors found that miR-19b-dependent cell cycle regulation is tightly associated with specification of Fezf1 or Mef2c-positive neurons, in spatio-temporal manners during chicken pallial development. These findings uncovered molecular mechanisms underlying microRNA-mediated neurogenic controls.  

      Weaknesses:  

      Although the authors in this study claimed striking similarities of miR-19a/b in neurogenesis between mouse and chick pallium, a previous study by Bian et al. revealed that miR-19a contributes the expansion of radial glial cells by suppressing PTEN expression in the developing mouse neocortex, while miR-19b maintains apical progenitors via inhibiting E2f2 and NeuroD1 in chicken pallium. Thus, it is still unclear whether the orthologous microRNAs regulate common or species-specific target genes.  

      In this study, we have proposed that miR-19b regulates similar phenomena in both species using different targets, such as regulation of proliferation through PTEN in mouse and through E2f8 in the chicken.

      The spatiotemporal expression patterns of miR-19b and several genes are not convincing. For example, the authors claim that NeuroD1 is initially expressed uniformly in the subventricular zone (SVZ) but disappears in the DVR region by HH29 and becomes detectable by HH35 (Figure 1). However, the in situ hybridization data revealed that NeuroD1 is highly expressed in the SVZ of the DVR at HH29 (Figure 4F). Thus, perhaps due to the problem of immunohistochemistry, the authors have not been able to detect NeuroD1 expression in Figure 1D, and the interpretation of the data may require significant modification.  

      While Fig. 1B may suggest that NeuroD1 expression has disappeared from the DVR region by HH29, this is not true in general because we have observed NeuroD1 to be expressed in the DVR at HH29 in images of other sections. In the revised version, we will include improved images for panels of Fig. 1B which accurately show the expression pattern of NeuroD1 and miR19b at stages HH29 and HH35.  

      It seems that miR-19b is also expressed in neurons (Figure 1), suggesting the role of miR19-b must be different in progenitors and differentiated neurons. The data on the gain- and loss-offunction analysis of miR-19b on the expression of Mef2c should be carefully considered, as it is possible that these experiments disturb the neuronal functions of miR19b rather than in the progenitors.

      As pointed out by the reviewer, it is quite possible that upon manipulation of miR19b its neuronal functions are also perturbed in addition to its function in progenitor cells. After introducing gain-of-function construct in progenitor cells, we have observed changes in the morphology of these cells. These data will be included in the revised version.

      The regions of chicken pallium were not consistent among figures: in Figure 1, they showed caudal parts of the pallium (HH29 and 35), while the data in Figure 4 corresponded to the rostral part of the pallium (Figure 4B).  

      We will address this by providing images from a similar region of the pallium showing Fezf2 and Mef2c expression patterns.

      The neurons expressing Fezf2 and Mef2 in the chicken pallium are not homologous neuronal subtypes to mammalian deep and superficial cortical neurons. The authors must understand that chicken pallial development proceeds in an outside-in manner. Thus, Mef2c-postive neurons in a superficial part are early-born neurons, while FezF2-positive neurons residing in deep areas are later-born neurons. It should be noted that the expression of a single marker gene does not support cell type homology, and the authors' description "the possibility of primitive pallial lamina formation in common ancestors of birds and mammals" is misleading.  

      We appreciate this clarification and will modify or remove this statement regarding the “primitive pallial lamina formation” to avoid any confusion and misinterpretation. 

      Overexpression of CDKN1A or Sponge-19b induced ectopic expression of Fezf2 in the ventricular zone (Figure 3C, E). Do these cells maintain progenitor statement or prematurely differentiate to neurons? In addition, the authors must explain that the induction of Fezf2 is also detected in GFP-negative cells.  

      We propose to follow up on the fate of these cells by extending the observation period post-overexpression of CDKN1A or Sponge-19b to assess whether they retain progenitor characteristics or differentiate. The presence of Fezf2 in GFP-negative cells could be due to the non-cell-autonomous effects, and we will discuss this possibility in the revised manuscript.

      Reviewer #2 (Public review):  

      Summary:  

      This paper investigates the general concept that avian and mammalian pallium specifications share similar mechanisms. To explore that idea, the authors focus their attention on the role of miR-19b as a key controlling factor in the neuronal proliferation/differentiation balance. To do so, the authors checked the expression and protein level of several genes involved in neuronal differentiation, such as NeuroD1 or E2f8, genes also expressed in mammals after conducting their functional gene manipulation experiments. The work also shows a dysregulation in the number of neurons from lower and upper layers when miR-19b expression is altered.  

      To test it, the authors conducted a series of functional experiments of gain and loss of function (G&LoF) and enhancer-reporter assays. The enhancer-reporter assays demonstrate a direct relationship between miR-19b and NeuroD1 and E2f8 which is also validated by the G&LoF experiments. It´s also noteworthy to mention that the way miR-19b acts is maintaining the progenitor cells from the ventricular zone in an undifferentiated stage, thus promoting them into a stage of cellular division.  

      Overall, the paper argues that the expression of miR-19b in the ventricular zone promotes the cells in a proliferative phase and inhibits the expression of differentiation genes such as E2f8 and NeurD1. The authors claim that a decrease in the progenitor cell pool leads to an increase and decrease in neurons in the lower and upper layers, respectively.  

      Strengths:  

      (1) Novelty Contribution  

      The paper offers strong arguments to prove that the neurodevelopmental basis between mammals and birds is quite the same. Moreover, this work contributes to a better understanding of brain evolution along the animal evolutionary tree and will give us a clearer idea about the roots of how our brain has been developed. This stands in contrast to the conventional framing of mammal brain development as an independent subject unlinked to the "less evolved species". The authors also nicely show a concept that was previously restricted to mammals - the role of microRNAs in development.  

      (2) Right experimental approach  

      The authors perform a set of functional experiments correctly adjusted to answer the role of miR-19b in the control of neuronal stem cell proliferation and differentiation. Their histological, functional, and genetic approach gives us a clear idea about the relations between several genes involved in the differentiation of the neurons in the avian pallium. In this idea, they maintain the role of miR-19b as a hub controller, keeping the ventricular zone cells in an undifferentiated stage to perpetuate the cellular pool.  

      (3) Future directions  

      The findings open a door to future experiments, particularly to a better comprehension of the role of microRNAs and pallidal genetic connections. Furthermore, this work also proves the use of avians as a model to study cortical development due to the similarities with mammals.  

      Weaknesses:  

      While there are questions answered, there are still several that remain unsolved. The experiments analyzed here lead us to speculate that the early differentiation of the progenitor cells from the ventricular zone entails a reduction in the cellular pool, affecting thereafter the number of latter-born neurons (upper layers). The authors should explore that option by testing progenitor cell markers in the ventricular zone, such as Pax6. Even so, it remains possible that miR-19b is also changing the expression pattern of neurons that are going to populate the different layers, instead of their numbers, so the authors cannot rule that out or verify it. Since the paper focuses on the role of miR-19b in patterning, I think the authors should check the relationship and expression between progenitors (Pax6) and intermediate (Tbr2) cells when miR-19b is affected. Since neuronal expression markers change so fast within a few days (HH24HH35), I don't understand why the authors stop the functional experiments at different time points.  

      To address this, we will examine the expression of Pax6 and Tbr2 following both gain-of-function and loss-of-function manipulations of miR-19b. We agree with the reviewer that miR-19b may influence not only the number of neurons but also the expression pattern of neuronal markers.  Due to the limitations of our experimental design, we acknowledge that this possibility cannot be ruled out. 

      Regarding time points chosen for the functional experiments: We selected different stages based on the expression dynamics of specific markers. To detect possible ectopic induction, we analyzed developmental stages where the expression of a given marker is normally absent. Conversely, to detect loss of expression we examined stages in which the marker is typically expressed robustly. This approach allowed us to better interpret the functional consequences of miR-19b manipulation within relevant developmental windows. 

      Reviewer #3 (Public review):  

      Summary:  

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.  

      Strengths:  

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.  

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.  

      Weaknesses:  

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.  

      Major Concerns:  

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:  

      Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.  

      We agree that precise anatomical context is essential. In the revised version, we propose to: 

      a) Include schematics of the regions of interest where experimental manipulations were performed.

      b) Provide low-magnification panoramic images where appropriate, for anatomical reference.

      c) Show the expression patterns of relevant marker genes to better justify stages and region selection. 

      d) Provide the expression pattern of markers in panoramic view to show differential expression in the DVR and Wulst region and interpret our results accordingly.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.  

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?  

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.  

      We appreciate the correction suggested by the reviewer. In the revised manuscript: a) SVZ will be labeled correctly in all figures and descriptions b) The mantle zone terminology will be incorporated appropriately c) The two Xenopus-based references in line 118 will be removed as they are not directly relevant and d) We will refer to the Rueda-Alaña et al., (2025) to guide accurate anatomical labeling and interpretation of proliferative zones.

      We also acknowledge that while some proliferative cells exist in the SVZ of the chicken, they are relatively few and do not express typical basal progenitor markers such as Tbr2 (Nomura et al., 2016, Development). We will ensure that this nuance is clearly reflected in the text. 

      What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.  

      We propose to modify the text and figures to accurately represent the correct location of the Wulst in the chick pallium.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deeplayer-like neurons in the pallium of birds.  

      We acknowledge this conceptual oversight. In the manuscript: a) We will avoid direct comparisons between the entire chick pallium and the mammalian neocortex b) Terms like “upper-layer-like” and deep-layer-like” neurons will be removed or modified d) We will cite and integrate recent findings from Rueda-Alaña et al. (2025), Zaremba et al. (2025), and Hecker et al. (2025), which provide updated insights from scRNAseq analyses into the complexity of avian pallial neurons. Cell types will be described based on marker gene expression only, without unsupported evolutionary or homology claims.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.  

      We agree with the reviewer. In the revised version, we will remove the misleading terms and outdated concepts and avoid speculative evolutionary conclusions.  

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.  

      In the revised version, these above-mentioned articles (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) will be included in the introduction and discussion.  Our interpretations will be updated to reflect these new insights into neuronal diversity and regionalization in the chick pallium. 

      Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.  

      We will replace all instances of “auditory cortex” with “Field L”, as per the accepted terminology in the Avian Nomenclature Forum.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However, the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.  

      In the revised version, we will replace “forebrain” with “Pallium” throughout the manuscript to more accurately reflect the regions studied.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.  

      We agree with the reviewers that an alteration in one marker for a particular cell type may not indicate a change in patterning. However, including the effect of miR-19b gain- and loss-of-function on Pax6 and Tbr2, may strengthen the idea that it affects patterning as suggested by reviewer #2. 

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.  

      In the revised version, we propose to include a diagram to visually summarize the proposed interactions between miR-19b, E2f8, NeuroD1, and other key regulators.  

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.  

      We will expand the Methods section to provide more detailed protocols and justifications for experimental design, in alignment with journal policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:

      (a) Illness-Causal vs. Non-causal - Illness First;

      (b) Illness-Causal vs. Non-causal - Mechanical First;

      (c) Mechanical-Causal vs. Non-causal - Illness First;

      (d) Mechanical-Causal vs. Non-causal - Mechanical First.

      This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      The hypothesis that a neural mechanism supports causal inference across domains predicts higher univariate responses when causal inferences occur than when they do not. This prediction was not generated by us ad hoc but rather has been made by almost all previous cognitive neuroscience papers on this topic (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Kuperberg et al., 2006; Fenker et al., 2010; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2023; Chow et al., 2008; Mason & Just, 2011; Prat et al., 2011). Contrary to this hypothesis, we find that the precuneus (PC) is most activated for illness inferences and most deactivated for mechanical inferences relative to rest, suggesting that the PC does not support domain-general causal inference. To further probe the selectivity of the PC for illness inferences, we created group overlap maps that compare PC responses to illness inferences and mechanical inferences across participants. The PC shows a strong preference for illness inferences and is therefore unlikely to support causal inferences irrespective of their content (Supplementary Figures 6 and 7). We also note that, in whole-cortex analysis, no shared regions responded more to causal inference than noncausal vignettes across domains. Therefore, the prediction made by the ‘domain-general causal engine’ proposal as it has been articulated in the literature is not supported in our data.

      Taking a multivariate approach, the hypothesis that a neural mechanism supports causal inference across domains also predicts that relevant regions can decode between all possible pairs of causal vs. noncausal conditions (e.g., Illness-Causal vs. Noncausal-Illness First, Mechanical-Causal vs. Noncausal-Illness First, etc.). The analysis described by the reviewer in (2), in which the regions that distinguish between causal vs. noncausal conditions in searchlight MVPA are used as ROIs to test various causal vs. noncausal contrasts, is non-independent. Therefore, we cannot perform this analysis. In accordance with the reviewer’s suggestions in (3), now include searchlight MVPA results for the mechanical inference condition compared to the two noncausal conditions (Supplementary Figure 9). No regions are shared across the searchlight analyses comparing all possible pairs of causal and noncausal conditions, providing further evidence that there are no shared neural responses to causal inference in our dataset.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      We have performed these analyses and now include a table of the results as well as figures displaying the dispersion across participants (Supplementary Tables 2 and 3, Supplementary Figures 10 and 11). In the left PC, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the same noncausal condition. The language network did not decode between any causal/noncausal pairs. In the logic network, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the other noncausal condition. Thus, no regions showed the predicted ‘domain-general’ pattern, i.e., significant decoding between all causal/noncausal pairs. 

      Importantly, the decoding results must be interpreted in light of significant univariate differences across conditions (e.g., greater responses to illness inferences compared to noncausal vignettes in the PC). Linear classifiers are highly sensitive to univariate differences (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022).

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:

      In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      We thank the reviewer for their suggestion to test the FFA region. We think this provides an interesting comparison to the PC and hypothesized that, in contrast to the PC, the FFA does not encode abstract causal information about animacy-specific processes (i.e., illness). As we mention in the Introduction, although the fusiform face area (FFA) also exhibits a preference for animates, it does so primarily for images in sighted people (Kanwisher et al., 1997; Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Konkle & Caramazza, 2013; Connolly et al., 2016; Bi et al., 2016).

      We did not select the FFA as a region of interest when preregistering the current study because we did not predict it would show sensitivity to causal knowledge. In accordance with the reviewer’s suggestions, we now include the FFA as an ROI in individual-subject univariate analysis (Supplementary Figure 8, Appendix 4). Because we did not run a separate FFA localizer task when collecting the data, we used FFA search spaces from a previous study investigating responses to face images (Julian et al., 2012). We followed the same analysis procedure that was used to investigate responses to illness inferences in the PC. Neither left nor right FFA exhibited a preference for illness inferences compared to mechanical inferences or to the noncausal conditions. This result is interesting and is now briefly discussed in the Discussion section.

      (6) Visual representation of results:

      In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      We thank the reviewer for this suggestion. We now include scattered box plots displaying the dispersion in average percent signal change across participants in Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14.

      (7) Task selection:

      (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.

      (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

      We chose an orthogonal foil detection task, rather than an explicit causal judgment task, to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes (see Kuperberg et al., 2006 for discussion). Analogous foil detection paradigms have been used to study sentence processing and word recognition (Pallier et al., 2011; Dehaene-Lambertz et al., 2018). We now clarify this in the Introduction. The “magical” element occurred both within and across sentences so that participants could not use coherence as a cue to complete the task. Approximately 1/5 (19%) of the trials were magical catch trials to ensure that participants remained attentive throughout the experiment.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      We have modified the Introduction to clarify that the primary goal of the current study is to test the claim that semantic networks encode causal knowledge – in this case, causal intuitive theories of biology. Most conceptions of intuitive biology, intuitive psychology, and intuitive physics describe them as causal frameworks (e.g., Wellman & Gelman, 1992; Simons & Keil, 1995; Keil et al., 1999; Tenenbaum, Griffiths, & Niyogi, 2007; Gopnik & Wellman, 2012; Gerstenberg & Tenenbaum, 2017). As noted above, we chose an implicit task to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes. We are not sure what the reviewer means when they say that mechanical and physical controls are reductive. This is the standard control condition in neural and behavioral paradigms that investigate intuitive psychology and intuitive biology (e.g., Saxe & Kanwisher, 2003; Gelman & Wellman, 1991).

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      This study is preregistered (https://osf.io/6pnqg). The preregistration states that the precuneus is a hypothesized area of interest, so this is not a post-hoc hypothesis. Our hypothesis was informed by multiple prior studies implicating the precuneus in the semantic representation of animates (e.g., people, animals) (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). We also conducted a pilot experiment with separate participants prior to pre-registering the study. We now clarify our rationale for focusing on the precuneus in the Introduction:

      “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses). Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) recruits partially distinct neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). The precuneus (PC) is part of the ‘animacy’ semantic network and responds preferentially to living things (i.e., people and animals), whether presented as images or words (Devlin et al., 2002; Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). By contrast, parts of the visual system (e.g., fusiform face area) that respond preferentially to animates do so primarily for images (Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Mahon et al., 2009; Konkle & Caramazza, 2013; Connolly et al., 2016; see Bi et al., 2016 for a review). We hypothesized that the PC represents causal knowledge relevant to animates and tested the prediction that it would be activated during implicit causal inferences about illness, which rely on such knowledge (preregistration: https://osf.io/6pnqg).”

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:

      (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.

      (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.

      (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      We disagree with the reviewer’s argument that our use of an implicit “magic detection” task is problematic. Indeed, we think it is one of the advances of the current study over prior work.

      a) Prior work has shown that implicit mentalizing tasks (e.g., naturalistic movie watching) engages the theory of mind network, suggesting that the implicit/explicit nature of the task does not drive the activation of this network (Jacoby et al., 2016; Richardson et al., 2018). With these data in mind, it is unlikely that the implicit/explicit nature of the causal inference and theory of mind tasks in the present experiment can explain observed differences between them.

      b) Explicit causal inferences introduce a collection of executive processes that potentially confound the results and make it difficult to know whether neural signatures are related to causal inference per se. The current study focuses on the neural basis of implicit causal inference, a type of inference that is made routinely during language comprehension. We do not claim to find neural signatures of all causal inferences, we do not think any study could claim to do so because causal inferences are a highly varied class.

      c) Our findings do not exclude the possibility that content-invariant responses are elicited during explicit causality judgments. We clarify this point in the Results (e.g., “These results leave open the possibility that domain-general systems support the explicit search for causal connections”) and Discussion (e.g., “The discovery of novel causal relationships (e.g., ‘blicket detectors’; Gopnik et al., 2001) and the identification of complex causes, even in the case of illness, may depend in part on domain-general neural mechanisms”).

      d) Because the magic trials are excluded from our analyses, it is unclear how the imbalance in the number of magic trials could influence the results and our interpretation of them. We note that the number of catch trials in standard target detection paradigms are sometimes much lower than the number of target trials in each condition (e.g., Pallier et al., 2011).

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      Each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and should not impact our interpretations of the data, particularly because we average responses from each condition within a run before submitting them to MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

      We respectfully disagree with this assertion. Our primary analysis uses a within-subject leave-one-run-out approach. This approach allows us to use part of the data itself to localize animacy-relevant causal responses in the PC without engaging in ‘double-dipping’ or statistical non-independence (Vul & Kanwisher, 2011). We also use the mentalizing network localizer as a partial localizer for animacy. This is because the control condition (physical reasoning) does not include references to people or any animate agents (Supplementary Figures 1 and 15). We now clarify this point in Methods section of the paper (see below).

      From the Methods: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer)...Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents, enabling us to use the mentalizing localizer as a localizer for animacy.”

      Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      We thank the reviewer for this suggestion. While the PC shows the most robust univariate preference for illness inferences compared to both mechanical inferences and noncausal vignettes, the TPJ also shows a preference for illness inferences compared to mechanical inferences in individual-subject fROI analysis. However, as we mention in the Results section, the TPJ does not show a preference for illness inferences compared to noncausal vignettes, suggesting that the TPJ is selective for animacy but may not be as sensitive to causal knowledge about animacy-specific processes. When describing our results, we refer to the ‘animacy network’ (i.e., PC and TPJ) but also highlight that the PC exhibited the most robust responses to illness inferences (from the Results: “Inferring illness causes preferentially recruited the animacy semantic network, particularly the PC”; from the Discussion: “We find that a semantic network previously implicated in thinking about animates, particularly the precuneus (PC), is preferentially engaged when people infer causes of illness…”). We did not collect resting state data that would enable a connectivity analysis, as the reviewer suggests. This is an interesting direction for future work.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      The original and current versions of the paper include results from multiple multivariate analyses, including whole-cortex searchlight MVPA and individual-subject fROI MVPA performed in multiple search spaces (see Supplementary Figures 10 and 11, Supplementary Tables 2 and 3).

      We note that our preregistered predictions focused primarily on univariate differences. This is because the current study investigates neural responses to inferences, and univariate increases in activity is thought to reflect the processing of such inferences. We use multivariate analyses to complement our primary univariate analyses. However, given that we observe significant univariate effects and that multivariate analyses are heavily influenced by significant univariate effects (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022), our univariate results constitute the main findings of the paper.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). These figures show that there is high overlap across participants in PC responses to illness inferences but not mechanical inferences. In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3. 

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I was unable to access the pre-registration on OSF because special permission is required.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      (2) The length of the MRI session is quite long (around 2 hours). It is generally discouraged to have such extended data acquisition periods, as this can affect the stability and cleanliness of the data. Did you observe any effects of fatigue or attention decline in your data?

      The session was 2 hours long including 1-2 10-minute breaks. Without breaks, the scan would be approximately 1.5 hours. This is a standard length for MRI experiments. The main experiment (causal inference task) was always conducted first and lasted approximately 1 hour. Accuracy did not decrease across the 6 runs of this experiment (repeated measures ANOVA, F<sub>(5,114)</sub> = 1.35, p = .25).

      (3) The last sentence of the results states: "Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5)." This statement is not entirely accurate. As I previously pointed out, the MVPA searchlight analysis is not very informative and is difficult to interpret. However, as previously suggested, there are additional steps that could be taken to better understand and interpret these results. It is incorrect to conclude that because the brain regions identified in the MVPA analyses show a preference for non-causal vignettes in univariate analyses, the multivariate results lack value. While univariate analyses may show a preference for a specific condition, multivariate analyses can reveal more fine-grained representations of multiple conditions. For a notable example, consider the fusiform face area (FFA) that shows a clear preference for faces at the univariate level but can significantly decode other categories at the multivariate level, even when faces are not included in the analysis.

      The decoding analysis that the reviewer is suggesting for the current study would be analogous to identifying univariate differences between faces and places in the FFA and then decoding between faces and places and claiming that the FFA represents places because the decoding is significant. The decoding analyses enabled by our design are not equivalent to decoding within a condition (e.g., among face identities, among types of illness inferences), as the reviewer suggests above. It is not that such multivariate analyses “lack value” but that they recapitulate established univariate differences. Multivariate analyses are useful for revealing more fine-grained representations when i) significant univariate differences are not observed, or ii) when it is possible to decode among categories within a condition (e.g., among face identities, among types of illness inferences). We are currently collecting data that will enable us to perform within-condition decoding analyses in future work, but the design of the current study does not allow for such a comparison.

      We note that the original quotation from the manuscript has been removed because it is no longer accurate. When including participant response time as a covariate of no interest in the GLM, no regions are shared across the 4 searchlight analyses comparing causal and noncausal conditions, suggesting that there are no shared neural responses to causal inference in our dataset.

      Reviewer #2 (Recommendations for the authors):

      (1) Moderating the strength of some claims made to justify the main hypothesis (e.g., "people but not machines transmit diseases to each other through physical contact").

      We changed this wording so that it now reads: “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses).” (Introduction)

      (2) Expanding the paragraph introducing the sub-question about inferring people's "body states" vs "mental states". In addition, given the order in which the hypotheses are introduced, and the results are presented, I would suggest switching the order of presentation of both localizers in the methods section and adding a quick reminder of the hypotheses that justify using these localizers.

      We thank the reviewer for these suggestions. In accordance their suggestions, we have expanded the paragraph Introduction that introduces the “body states” vs. “mental states” question (see below). We have also switched the order of the localizer descriptions in the Methods section and added a sentence at the start of each section describing the relevant hypotheses (see below).

      From the Introduction: “We also compared neural responses to causal inferences about the body (i.e., illness) and inferences about the mind (i.e., mental states). Both types of inferences are about animate entities, and some developmental work suggests that children use the same set of causal principles to think about bodies and minds (Carey, 1985, 1988). Other evidence suggests that by early childhood, young children have distinct causal knowledge about the body and the mind (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010). For instance, preschoolers are more likely to view illness as a consequence of biological causes, such as contagion, rather than psychological causes, such as malicious intent (Springer & Ruckel, 1992; Raman & Winer, 2004; see also Legare & Gelman, 2008). The neural relationship between inferences about bodies and minds has not been fully described. The ‘mentalizing network’, including the PC, is engaged when people reason about agents’ beliefs (Saxe & Kanwisher, 2003; Saxe et al., 2006; Saxe & Powell, 2006; Dodell-Feder et al., 2011; Dufour et al., 2013). We localized this network in individual participants and measured its neuroanatomical relationship to the network activated by illness inferences.”

      From the Methods, localizer descriptions: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant… To test for the presence of domain-general responses to causal inference in the language and logic networks (e.g., Kuperberg et al., 2006; Operskalski & Barbey, 2017), we used an additional localizer task to identify both networks in each participant.”

      (3) Adding a quick analysis of lateralization to support the corresponding claim of left lateralization of responses to causal inferences.

      In accordance with the reviewer’s suggestion, we now include hemisphere as a factor in all ANOVAs comparing univariate responses across conditions.

      From the Results: “In individual-subject fROI analysis (leave-one-run-out), we similarly found that inferring illness causes activated the PC more than inferring causes of mechanical breakdown (repeated measures ANOVA, condition (Illness-Causal, Mechanical-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 19.18, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 0.3, p = .59, condition x hemisphere interaction, F<sub>(1,19)</sub> = 27.48, p < .001; Figure 1A). This effect was larger in the left than in the right PC (paired samples t-tests; left PC: t<sub>(19)</sub> = 5.36, p < .001, right PC: t<sub>(19)</sub> = 2.27, p = .04)…In contrast to the animacy-responsive PC, the anterior PPA showed the opposite pattern, responding more to mechanical inferences than illness inferences (leave-one-run-out individual-subject fROI analysis; repeated measures ANOVA, condition (Mechanical-Causal, Illness-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 17.93, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 1.33, p = .26, condition x hemisphere interaction, F<sub>(1,19)</sub> = 7.8, p = .01; Figure 4A). This effect was significant only in the left anterior PPA (paired samples t-tests; left anterior PPA: t<sub>(19)</sub> = 4, p < .001, right anterior PPA: t<sub>(19)</sub> = 1.88, p = .08).”

      (4) Making public and accessible the pre-registration OSF link.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      Reviewer #3 (Recommendations for the authors):

      In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3.

      Minor

      (1) Figure 2: Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC). If the analysis is the result of the MVPA, the figure should report the fact that only the left precuneus was analyzed.

      Figure 2 depicts the spatial dissociation in univariate responses to illness inferences and mental state inferences. We now clarify this in the figure legend.

      (2) VOTC and PSC acronyms are defined in the text after they appear for the first time. TPJ is never defined.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      We are grateful for this positive feedback.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      We regret that this information was not easy to find in our initial submission. As noted in the Figure 1D legend, Here and elsewhere, ipsi and contra are defined relative to the recorded DN(s). We have now added a sentence to the Results (right after we introduce Figure 1D) that also makes this point.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      Published work (Green et al., 2019) has shown that, even in the dark, flies will often walk for extended periods while holding the bump of EPG activity at a fixed location. During these epochs, the brain is essentially estimating that the fly is walking in a straight line in a fixed direction. (The fact that the fly is actually rotating a bit on the spherical treadmill is not something the fly can know, in the dark.) Thus, epochs where the EPG bump is held fixed are treated as menotactic bouts, even in darkness.

      Our results provide additional support for this interpretation. We find that, when flies are walking in darkness and holding the bump of EPG activity at a fixed location, they will make a corrective behavioral turning maneuver in response to an imposed bump-jump. This result argues that the flies are actually engaging in goal-directed straight-line walking, i.e. menotaxis, and it reproduces the findings of Green et al. (2019).

      To clarify this point, we have adjusted the wording of the Results pertaining to Figure 4.

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

      In this revision, we have reorganized Figures 1 and 2 (and associated text) to improve clarity. As part of this reorganization, we have removed this passage from the text, as it was a minor point in any event.

      Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      To clarify, we mention the role of memory in connection with two places in the manuscript. First, we note that the EPG/head direction system relies on learning and memory to construct a map of directional cues in the environment. These cues are, in principle, inherently neutral, i.e. without valence. Second, we note that specific mushroom body output neurons rely on learning and memory to store the valence associated with an odor. This information is not necessarily associated with an allocentric direction: it is simply the association of odor with value. Both of these ideas are well-attested by previous work.

      The reviewer may be suggesting a sequential scheme whereby the brain initializes an allocentric goal direction based on valence, and then maintains that goal direction in memory, based on that initialization. In other words, memory is used to associate valence with some allocentric direction. This seems plausible, but it is not a claim we make in our manuscript.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      We have added several sentences to the Discussion to clarify this point. According to this see-saw model, steering can emerge from right/left asymmetries in excitation, or inhibition, or both. It may be nonintuitive to think that inhibitory input to a DN can produce an action. However, this becomes more plausible given our finding that DNa02 has a relatively high basal firing rate (Fig. 1D), and DNa02 hyperpolarization is associated with contraversive turning (Fig. 5A). It is also relevant to note that there are many inhibitory cell types that form strong unilateral connections onto DNa02 (e.g., AOTU019).

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

      We thank the reviewer for this positive feedback.

      Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble

      (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02

      (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      We are grateful for this detailed positive feedback.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      We apologize for these omissions of relevant citations, which we have now fixed. Specifically, in our revised Discussion, we now point out that:

      - Braun et al. (2024) reported that bilateral optogenetic activation of either DNa02 or DNa01 can drive turning (in either direction). 

      - Braun et al. (2024) also identified DNb02 as a steering-related DN.

      - Bidaye et al. (2020), Sapkal et al. (2024), and Braun et al. (2024) all contributed to the identification of DNp09 as a broadcaster DN with the capacity to promote ipsiversive turning.

      We have also revised the beginning of the Results section titled “Contributions of single descending neuron types to steering behavior”, as suggested by the Reviewer.

      Finally, we agree with the Reviewer’s overall point that steering is influenced by multiple DNs. We have not claimed that any DN is solely responsible for steering. As we note in the Discussion: “We found that optogenetically inhibiting DNa01 produced only small defects in steering, and inhibiting DNa02 did not produce statistically significant effects on steering; these results make sense if DNa02 is just one of many steering DNs.”

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      This is a reasonable request. We used DN split-Gal4 lines to express three types of UAS-linked transgenes:

      (1) GFP

      In these flies, we know that expression in DNs is restricted to the DN types in question, based on published work (Namki et al., 2018), as well as the fact that we see one labeled DN soma per hemisphere. When we label both cells with GFP, we use the spike waveform to identify DNa02 and DNa01, as described in Figure S1

      (2) ReaChR

      In these flies, expression patterns were different in different flies because ReaChR expression was stochastically sparsened using hs-FLP. Expression was validated in each fly after the experiment, as described in the Methods (“Stochastic ReaChR expression”). hs-FLP-mediated sparsening will necessarily produce stochastic patterns of expression in both DNa02 and off-target cells, and this is true of all the flies in this experiment. What makes the “unilateral” flies distinct from the “bilateral” flies is that unilateral flies express ReaChR in one copy of DNa02, whereas bilateral flies express ReaChR in both copies of DNa02. On average, off-target expression will be the same in both groups.

      (3) GtACR1

      In these flies, we initially assumed that GtACR1 expression was the same as GFP expression under control of the same driver. However, we agree with the reviewer’s point that these two expression patterns are not necessarily identical. Therefore, to address the reviewer’s question, we performed immunofluorescence microscopy to characterize GtACR1 patterns in the brain and VNC of both genotypes. These expression patterns are now shown in a new supplemental figure (Figure S8). This figure shows that, as it happens, expression of GtACR1 is indeed indistinguishable from the GFP expression patterns for the same lines (archived on the FlyLight website). Both DN split-Gal4 lines are largely selective for the DNs in question, with limited off-target labeling. We have now drawn attention to this off-target labeling in the last paragraph of the Results, where the GtACR1 results are discussed.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      Yes, the first portion of the manuscript focuses on DNa01 and DNa02. The latter part of the manuscript transitions to focus mainly on DNa02. 

      Our rationale is noted at the point in the manuscript where we make this transition, with the section titled “Steering toward internal goals”: “Having identified steering-related DNs, we proceeded to investigate the brain circuits that provide input to these DNs. Here we decided to focus on DNa02, as this cell’s activity is predictive of larger steering maneuvers.” When we say that DNa02 is predictive of larger steering maneuvers, we are referring to several specific results:

      - We obtain larger filter amplitudes for DNa02 versus DNa01 (Fig. 2A-C). This means that, just after a unit change in DN firing rate, we see on average a larger change in steering velocity for DNa02 versus DNa01.

      - The linear filter for DNa02 has a higher variance explained, as compared to DNa01 (Fig. 2D). This means that DNa02 is more predictive of steering.

      - The relationship between firing rate and rotational velocity (150 ms later) is steeper for DNa02 than for DNa01 (Fig. 2G). This means that, if we ignore dynamics and we just regress firing rate against subsequent rotational velocity, we see a higher-gain relationship for DNa02.

      Our focus on DNa02 was also driven by connectivity considerations. In the same paragraph (the first paragraph in the section titled “Steering toward internal goals”). We note that “there are strong anatomical pathways from the central complex to DNa02”; the same is not true of DNa01. This point has also been noted by other investigators (Hulse et al. 2021).

      We don’t think this focus on DNa02 makes our work biased or inaccessible. Any study must balance breadth with depth. A useful general way to balance these constraints is to begin a study with a somewhat broader scope, and then narrow the study’s focus to obtain more in-depth information. Here, we began with comparative study of two cell types, and we progressed to the cell type that we found more compelling.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.

      a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity

      (Figs. 1G and S3)."

      b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)." These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:

      a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.

      b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.

      c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).

      d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      We agree that it is difficult to interpret some of the correlations between DN activity and forward velocity, given that forward velocity and rotational velocity are themselves correlated to some degree. This is why we did not make claims based on these results in the main text. In response to these comments, we have taken the Reviewer’s suggestion to preserve Figure S4D (now Figure S3). The other components of these supplemental figures have been removed.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 [now Figure S4] shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      The reason we did not do the same analysis for DNa01 is that we only have two paired DNa01-DNa01 recordings. It turned out to be substantially more difficult to perform DNa01-DNa01 recordings, as compared to DNa02-DNa02 recordings. For this reason, we were not able to get more than two of these recordings.

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      We agree this is an interesting question. However, DNa02 firing rate and membrane potential are variable, and stimulus-evoked hyperpolarizations in these DNs tend to be relatively small (on the order of 1 mV, in the case of a contralateral fictive olfactory stimulus, Figure 5A). In the case of our fictive olfactory stimuli, we could look carefully for these hyperpolarizations because we had a very large number of trials, and we could align these trials precisely to stimulus onset. By contrast, for the bump-jump experiments, we have a more limited number of trials, and turning onset is not so tightly time-locked to the chemogenetic stimuli; for these reasons, we are hesitant to make claims about any bump-jump-related hyperpolarization in these trials.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      This is a good point. We cannot exclude the possibility that DNa02 is driving postural changes when the fly is stopped, and these postural changes are so small we cannot detect them. In this case, however, there would still be an interesting mismatch between the stimulus-evoked change in DNa02 firing rate (which is large) and the stimulus-evoked postural response (which would be very small). We have added language to the relevant Results section in order to make this explicit.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      In response to this comment, we have added synapses number information (represented by weighted arrows) to Figures 7C, E, and F. We also added information to the Methods to explain how cells were chosen for inclusion in this diagram. (In brief: we thresholded these connections so as to discard connections with small numbers of synapses.)

      We did perform an analogous connectome circuit analysis for DNa01, but if we use the same thresholds as we do for DNa02, we obtain a much sparser connectivity graph. We now show this in a new supplemental figure (Figure S9). MBON32 makes no monosynaptic connections onto DNa01, and it only forms one disynaptic connection, via LAL018, which is relatively weak. PFL3 and PFL2 make no mono- or disynaptic connections onto DNa01 comparable in strength to what we find for DNa02. 

      The sparser connectivity graph for DNa01 is partly due to the fact that fewer cell types converge onto DNa01 as compared to DNa02 (110 cell types, versus 287 cell types). Also, it seems that DNa01 is simply less closely connected to the central complex and mushroom body, as compared to DNa02.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      No. The forward filters describe the average velocity impulse response, given a brief step change in firing rate.

      Figure 1 and Figure S2 show that the sideways velocity forward filter is actually smaller for DNa01 than for DNa02. This means that a brief step change in DNa01 firing rate is followed by only a very small sideways velocity response. Conversely, the reverse filters describe the average firing rate impulse response, given a brief step change in sideways velocity. Figure S2 shows that the sideways velocity reverse filter is larger for DNa01 than for DNa02, but this means that the relationship between DNa01 activity and sideways velocity is so weak that we would need to see a very large neural response in order to get a brief step change in sideways velocity. In other words, the reverse filter says that DNa01 likely has very little role in determining sideways velocity.

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

      Yes, this effect is small in magnitude, which is not too surprising, given that many DNs seem to be involved in the control of steering in walking. To clarify the interpretation of these phenotypes, we have added a paragraph to the end of the Results:

      “All these effects are weak, and so they should be interpreted with caution. Also, both DN split-Gal4 lines drive expression in a few off-target cell types, which is another reason for caution (Fig. S8). However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would cause ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also cause ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.” We have also added caveats and clarifications in a new Discussion paragraph:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would cause ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found the sign conventions for rotational velocity particularly confusing. Figure 3 represents clockwise rotations as +ve values, but Figure 4H represents anticlockwise rotations as positive values. But for EPG bumps, anticlockwise rotations are given negative values. Please make them consistent unless I am missing something obvious.

      Different fields use different conventions for yaw velocity. In aeronautics, a clockwise turn is generally positive. In robotics and engineering of terrestrial vehicles, a counterclockwise turn is generally positive. Historically, most Drosophila studies that quantified rotational (yaw) velocity were focused on the behavior of flying flies, and these studies generally used the convention from aeronautics, where a clockwise turn is defined as a positive turn. When we began working in the field, we adopted this convention, in order to conform to previous literature. It might be argued that walking flies are more like robots than airplanes, but it seemed to us that it was confusing to have different conventions for different behaviors of the same animal. Thus, all of the published studies from our lab define clockwise rotation as having positive rotational velocity.

      Figure 4 focuses on the role of the central complex in steering. As the fly turns clockwise (rightward), the bump of activity in EPG neurons normally moves counterclockwise around the ellipsoid body, as viewed from the posterior side (Turner-Evans et al., 2017). The posterior view is the conventional way to represent these dynamics, because (1) we and others typically image the brain from the posterior side, not the anterior side, and (2) in a posterior view, the animal’s left is on the left side of the image, and vice versa. We have added a sentence to the Figure 4A legend to clarify these points.

      Previous work has shown that, when an experimenter artificially “jumps” the EPG bump, this causes the fly to make a compensatory turn that returns the bump to (approximately) its original location (Green et al., 2019). Our work supports this observation. Specifically, we find that clockwise bump jumps are generally followed by rightward turns (which drive the bump to return to its approximate original location via a counterclockwise path), and vice versa. This is noted in the Figure 4D legend. Note that Figure 4D plots the fly’s rotational velocity during the bump return, plotted against the initial bump jump. 

      Figure 4H shows that clockwise (blue) bump returns were typically preceded by leftward turning, counter-clockwise (green) bump returns were preceded by rightward turning, as expected. This is detailed in the Figure 4H legend, and it is consistent with the coordinate frame described above.

      (2) It would be helpful to have images of the DNa01 and DNa02 split lines used in this paper, considering this paper would most likely be used widely to describe the functions of these neurons. Similarly, images of their reconstructions would be a useful addition.

      High-quality three-dimensional confocal stacks of all the driver lines used in our study are publicly available. We have added this information to the Methods (under “Fly husbandry and genotypes”). Confocal images of the full morphologies of DNa01 and DNa02 have been previously published (Namiki et al., 2018). Figure 1A is a schematic that is intended to provide a quick visual summary of this information.

      EM reconstructions of DNa01 and DNa02 are publicly accessible in a whole-brain dataset (https://codex.flywire.ai/) and a whole-VNC dataset (https://neuprint.janelia.org/). Both datasets are referenced in our study. As these datasets are easy to search and browse via user-friendly web-based tools, we expect that interested readers will have no difficulty accessing the underlying datasets directly.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of the activity of the DNs that they "PREDICT steering during walking". This is an interesting word choice. Not causes, not correlates with, not encodes... does that mean the activity always precedes the action? Does that mean when you see activity, you will get behavior? This is important for assessing whether the DN activity is a cause or an effect. It is good to be cautious but it might be worth expanding on exactly what kind of connection is implied to justify the use of the word 'predict'.

      Conventionally, “predict” means “to indicate in advance”. We write that DNs “predict” certain features of behavior. We use this term because (1) these DNs correlate with certain features of behavior, and (2) changes in DN activity precede changes in behavior.

      The notion that neurons can “predict” behavior is not original to our study. Whenever neuroscientists summarize the relationship between neural activity and behavior by fitting a mathematical model (which may be as simple as a linear regression), the fitted model can be said to represent a “prediction” of behavior. These models are evaluated by comparing their predictions with measured behaviors. A good model is predictive, but it also implies that the underlying neural signal is also predictive (Levenstein et al., 2023 Journal of Neuroscience 43: 1074-1088; DOI: 10.1523/JNEUROSCI.1179-22.2022). Here, prediction simply means correlation, without necessarily implying causation. We also use “prediction” to imply correlation.

      We do not think the term “prediction” implies determinism. Meteorologists are said to predict the weather, but it is understood that their predictions are probabilistic, not deterministic. Certainly, we would not claim that there is a deterministic relationship between DN activity and behavior. Figure 2D shows that neither DN type can explain all the variance in the fly’s rotational or sideways velocity. At the same time, both DNs have significant predictive power.

      We might equally say that these DNs “encode” behavior. We have chosen to use the word “predict” rather than “encode” because we do not think it is necessary to use the framework of symbolic communication in connection with these DNs.

      We agree with the Reviewer that it is helpful to test whether any neuron that “predicts” a behavior might also “cause” this behavior. In Figure 8, we show that directly perturbing these DNs can indeed alter locomotor behavior, which suggests a causal role. Connectome analyses also suggest a causal role for these DNs in locomotor behavior (Figure 1B, see especially also Cheong et al., 2024).

      At the same time, it is clear from our results that these DNs are not “command neurons” for turning: they do not deterministically cause turning. Therefore, to avoid misunderstanding, we have generally been careful to summarize the results of our perturbation experiments by avoiding the statement that “this DN causes this behavior”. Rather, we have generally tried to say that “this DN influences this behavior”, or “this DN promotes this behavior”.

      (2) There is some concern about how the linear filter models were developed and then used to predict the relationship between firing rate and steering behavior: how exactly were the build and test data separated to avoid re-extracting the input? It reads like a self-fulfilling prophecy/tautology.

      We used conventional cross-validation for model fitting and evaluation. We apologize that this was not made explicit in our original submission; this was due to an oversight on our part. To be clear: linear filters were computed using the data from the first 20% of a given experiment. We then convolved each cell’s firing rate estimate with the computed Neuron→Behavior filter (the “forward filter”) using the data from the final 80% of the experiment, in order to generate behavioral predictions. Thus, when a model has high variance explained, this is not attributable to overfitting: rather, it quantifies the bona fide predictive power of the model. We have added this information to the Methods (under “Data analysis - Linear filter analysis”).

      (3) Type-O right above Figure 2 [now Figure 1E]: I assume spike rate fluctuations in DNa02 precede DNa01?

      Fixed. Thank you for reading the manuscript carefully.

      (4) The description of the other manuscripts about neural control of the steering as "follow-up" papers is a bit diminishing. They were likely independent works on a similar theme that happened afterwards, rather than deliberate extensions of this paper, so "subsequent" might be a more accurate description.

      We apologize, as we did not intend this to be diminishing. Given this request, we have revised “follow-up” to “subsequent”.

      (5) The idea that DNa02 is high-gain because it is more directly connected to motor neurons is a hypothesis and this should be made clear. We really don't know the functional consequences of the directness of a path or the number of synapses, and which circuits you compare to would change this. DNa02 may be a higher gain than DNa01, but what about relative to the other DNs that enter pre-motor regions? How do you handle a few synapses and several neurons in a common class? All of these connectivity-based deductions await functional tests - like yours! I think it is better to make this clear so readers don't assume a higher level of certainty than we have.

      The Reviewer asks how we handled few-synapse connections, and how we combined neurons in the same class. We apologize for not making this explicit in our original submission. We have now added this information to the Methods. Briefly, to select cell types for inclusion in Figures 7C, we identified all individual cells postsynaptic to PFL3 and presynaptic to DNa02, discarding any unitary connections with <5 synapses. We then grouped unitary connections by cell type, and then summed all synapse numbers within each connection group (e.g., summing all synapses in all PFL3→LAL126 connections). We then discarded connection groups having <200 synapses or <1% of a cell type’s pre- or postsynaptic total. Reported connection weights are per hemisphere, i.e. half of the total within each connection group. For Figure 7F we did the same, but now discarding connection groups having <70 synapses or <0.4% of a cell type’s pre- or postsynaptic total. In Figure S9, we used the same procedures for analyzing connections onto DNa01. 

      We agree that it is tricky to infer function from connectome data, and this applies to motor neuron connectivity. We bring up DN connectivity onto motor neurons in two places. First, in the Results, we note that “steering filters (i.e., rotational and sideways velocity filters) were larger for DNa02 (Fig. 2A,B). This means that an impulse change in firing rate predicts a larger change in steering for this neuron. In other words, this result suggests that DNa02 operates with higher gain. This may be related to the fact that DNa02 makes more direct output synapses onto motor neurons (Fig. 1B) [emphasis added].” We feel this is a relatively conservative statement.

      Subsequently, in the Discussion, we ask, “why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B) [emphasis added].” Again, we feel this is a relatively conservative statement.

      To be sure, none of the motor neurons postsynaptic to DNa02 actually receive most of their synaptic input from DNa02 (or indeed any DN), and this is typical of motor neurons controlling leg muscles. Rather, leg motor neurons tend to get most of their input from interneurons rather than motor neurons (Cheong et al. 2024). Available data suggests that the walking rhythm originates with intrinsic VNC central pattern generators, and the DNs that influence walking do so, in large part, by acting on VNC interneurons. These points have been detailed in recent connectome analyses (see especially Cheong et al. 2024).

      We are reluctant to broaden the scope of our connectome analyses to include other DNs for comparison, because we think these analyses are most appropriate to full-central-nervous-system-(CNS)-connectomes (brain and VNC together), which are currently under construction. Without a full-CNS-connectome, many of the DN axons in the VNC cannot be identified. In the future, we expect that full-CNS-connectomes will allow a systematic comparison of the input and output connectivity of all DN types, and probably also the tentative identification of new steering DNs. Those future analyses should generate new hypotheses about the specializations of DNa02, DNa01, and other DNs. Our study aims to help lay a conceptual foundation for that future work.

      (6) Given the emphasis on the DNa02 to Motor Neuron connectivity shown (Figure 1B) and multiple text mentions, could you include more analyses of which motor neurons are downstream and how these might be expected to affect leg movements? I would like to see the synapse numbers (Figure 1B) as well as the fraction of total output synapses. These additions would help understand the evidence for the "see-saw" model.

      We agree this is interesting. In follow-up work from our lab (Yang et al., 2023), we describe the detailed VNC connectivity linking DNa02 to motor neurons. We refer the Reviewer specifically to Figure 7 of that study (https://www.cell.com/cell/fulltext/S0092-8674(24)00962-0).

      We regret that the see-saw model was perhaps not clear in our original submission. Briefly, this model proposes that an increase in excitatory synaptic input to one DN (and/or a disinhibition of that DN) is often accompanied by an increase in inhibitory synaptic input to the contralateral DN. This model is motivated by connectome data on the brain inputs to DNa02 (Figure 7), along with our observation that excitation of one DN is often accompanied by inhibition of the contralateral DN (Figure 5). We have now added text to the Results in several places in order to clarify these points. 

      This model specifically pertains to the brain inputs to DNs, comparing the downstream targets of these DNs in the VNC would not be a test of this hypothesis. The Reviewer may be asking to see whether there is any connectivity in the brain from one DN to its contralateral partner. We do not find connections of this sort, aside from multisynaptic connections that rely on very weak links (~10 synapses per connection). Figure 7 depicts a much stronger basis for this hypothesis, involving feedforward see-saw connections from PFL3 and MBON32. 

      (7) The conclusions from the data in Figure 8 could be explained more clearly. These seem like small effect sizes on subtle differences in leg movements - maybe like what was seen in granular control by Moonwalker's circuits? Measuring joint angles or step parameters might help clarify, but a summary description would help the reader.

      We agree that these results were not explained very well in our original submission. 

      In our revised manuscript, we have added a new paragraph to the end of this Results section providing some summary and interpretation:

      “All these effects are weak, and so they should be interpreted with caution. However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would promote ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also promote ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.”

      Moreover, in the Discussion, we have also added a new paragraph that synthesizes these results with other results in our study, while also noting the limitations of our study:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would promote ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      In Figure 8D-H, we measure step parameters in freely walking flies during acute optogenetic inhibition of DNa01 and DNa02. In experiments measuring neural activity in flies walking on a spherical treadmill, we did not have a way to measure step parameters. Subsequently, this methodology was developed by Yang et al. (2023) and results for DNa02 are described in that study. 

      Reviewer #3 (Recommendations for the authors):

      Minor Points:

      (1) If space allows, actual membrane potential should be mentioned when raw recordings are shown (for example Figure 1D).

      We have now added absolute membrane potential information to Figure 1d.

      (2) Typo in the sentence "To address this issue directly, we looked closely at the timing of each cell's recruitment in our dual recordings, and found that spike rate fluctuations in DNa02 typically preceded the spike rate fluctuations in DNa02 (Fig. 2A)." The final word should be "DNa01".

      Fixed. Thank you for reading the manuscript carefully.

      (3) Figure 2A - although there aren't direct connections between a01 and a02 in the connectome, the authors never rule out functional connectivity between these two. Given a02 precedes a01, shouldn't this be addressed?

      In the full brain FAFB data set, there are two disynaptic connections from DNa02 onto the ipsilateral copy of DNa01. One connection is via CB0556 (which is GABAergic), and the other is via LAL018 (which is cholinergic). The relevant DNa02 output connections are very weak: each DNa02→CB0556 connection consists of 11 synapses, whereas each DNa02→LAL018 connection consists of 10 synapses (on average). Conversely, each CB0556→DNa01 connection consists of 29 synapses, whereas  each LAL018→DNa01 connection consists of 64 synapses. In short, LAL018 is a nontrivial source of excitatory input to DNa01, but DNa02 is not positioned to exert much influence over LAL018, and the two disynaptic connections from DNa02 onto DNa01 also have the opposite sign. Thus, it seems unlikely that DNa02 is a major driver of DNa01 activity. At the same time, it is difficult to completely exclude this possibility, because we do not understand the logic of the very complicated premotor inputs to these DNs in the brain. Thus, we are hesitant to make a strong statement on this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cognitive and brain development during the first two years of life is vast and determinant for later development. However, longitudinal infant studies are complicated and restricted to occidental high-income countries. This study uses fNIRS to investigate the developmental trajectories of functional connectivity networks in infants from a rural community in Gambia. In addition to resting-state data collected from 5 to 24 months, the authors collected growing measures from birth until 24 months and administrated an executive functioning task at 3 or 5 years old.

      The results show left and right frontal-middle and right frontal-posterior negative connections at 5 months that increase with age (i.e., become less negative). Interestingly, contrary to previous findings in high-income countries, there was a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months. Additionally, the study describes that some connectivity patterns related to better cognitive flexibility at pre-school age.

      Strengths:

      - The authors analyze data from 204 infants from a rural area of Gambia, already a big sample for most infant studies. The study might encourage more research on different underrepresented infant populations (i.e., infants not living in occidental high-income countries).

      - The study shows that fNIRS is a feasible instrument to investigate cognitive development when access to fMRI is not possible or outside a lab setting.

      - The fNIRS data preprocessing and analysis are well-planned, implemented, and carefully described. For example, the authors report how the choices in the parameters for the motion artifacts detection algorithm affect data rejection and show how connectivity stability varies with the length of the data segment to justify the threshold of at least 250 seconds free of artifacts for inclusion.

      - The authors use proper statistical methods for analysis, considering the complexity of the dataset.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - No co-registration of the optodes is implemented. The authors checked for correct placement by looking at pictures taken during the testing session. However, head shape and size differences might affect the results, especially considering that the study involves infants from 5 months to 24 months and that the same fNIRS array was used at all ages.

      The fNIRS array used in this work was co-registered onto age-appropriate MNI templates at every time point in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’. The procedure mentioned by the reviewer, involving the examination of pictures showing the placement of headbands on participants, aimed to exclude infants with excessive cap displacement from further analysis.

      - The authors regress the global signal to remove systemic physiological noise. While the authors also report the changes in connectivity without global signal regression, there are some critical differences. In particular, the apparent decrease in frontal inter-hemispheric connections is not present when global signal regression is omitted, even though it is present for deoxy-Hb. The authors use connectivity results obtained after applying global signal regression for further analysis. The choice of regressing the global signal is questionable since it has been shown to introduce anti-correlations in fMRI data (Murphy et al., 2009), and fNIRS in young infants does not seem to be highly affected by physiological noise (Emberson et al., 2016). Systemic physiological noise might change at different ages, which makes its remotion critical to investigate functional network development. However, global signal regression might also affect the data differently. The study would have benefited from having short separation channels to measure the systemic psychological component in the data.

      The work of Emberson et. al (2016) mentioned by the reviewer highlights indeed the challenges of removing systemic changes from the infants’ haemodynamic signal with short-channel separation (SSC). In fact, even a SSC of 1 cm detected changes in the blood in the brain, therefore by regressing this signal from the recorded one, the authors removed both systemic changes AND haemodynamic signal. This paper from Emberson et. al (2016) is taken as a reference in the field to suggest that SSC might not be an ideal tool to remove systemic changes when collecting fNIRS data on young infants, as we did in this work.

      We agree with the reviewer's observation that systemic physiological noise may vary with age and among infants. Therefore, for each infant at each age, we regressed the mean value calculated across all channels. This ensures that the regressed signal is not biased by averaged calculations at group levels.

      We are aware of the criticisms directed towards global signal regression in the fMRI literature, although some other works showed anticorrelations in functional connectivity networks both with and without global signal regression (Chaia, 2012). Furthermore, Murphy himself revised his criticism on the use of global signal regression in functional connectivity analysis in one of his more recent works (Murphy et al, 2017). The fact that the decreased FC is significant in results from data pre-processed without global signal regression gives us confidence that this finding is statistically robust and not solely driven by this preprocessing choice in our pipeline.

      An interesting study by Abdalmalak et al. (2022) demonstrated that failing to correct for systemic changes using any method is inappropriate when estimating FC with fNIRS, as it can lead to a high risk of elevated connectivity across the whole brain (see Figure 4 of the mentioned paper). Consequently, we strongly advocate for the implementation of global signal regression in our analysis pipeline as a fundamental step for accurate functional connectivity estimations.

      References:

      Emberson, L. L., Crosswhite, S. L., Goodwin, J. R., Berger, A. J., & Aslin, R. N. (2016). Isolating the effects of surface vasculature in infant neuroimaging using short-distance optical channels: a combination of local and global effects. Neurophotonics, 3(3), 031406-031406.

      Chaia, X. J., Castañóna, A. N., Öngürb, D., & Whitfield-Gabrielia, S. (2012). Anticorrelations in resting state networks without global signal regression. NeuroImage, 59(2), 1420–1428. https://doi.org/10.1515/9783050076010-014

      Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154(November 2016), 169–173. https://doi.org/10.1016/j.neuroimage.2016.11.052

      Abdalmalak, A., Novi, S. L., Kazazian, K., Norton, L., Benaglia, T., Slessarev, M., ... & Owen, A. M. (2022). Effects of systemic physiology on mapping resting-state networks using functional near-infrared spectroscopy. Frontiers in neuroscience, 16, 803297.

      - I believe the authors bypass a fundamental point in their framing. When discussing the results, the authors compare the developmental trajectories of the infants tested in a rural area of Gambia with the trajectories reported in previous studies on infants growing in occidental high-income countries (likely in urban contexts) and attribute the differences to adverse effects (i.e., nutritional deficits). Differences in developmental trajectories might also derive from other environmental and cultural differences that do not necessarily lead to poor cognitive development.

      We agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to investigate this further” (line 238).

      - While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level, the evidence regarding the links between adverse situations, developmental trajectories, and later cognitive capacities is weaker. The authors find that early restricted growth predicts specific connectivity patterns at 24 months and that certain connectivity patterns at specific ages predict cognitive flexibility. However, the link between development trajectories (individual changes in connectivity) with growth and later cognitive capacities is missing. To address this question adequately, the study should have compared infants with different growing profiles or those who suffered or did not from undernutrition. However, as the authors discussed, they lacked statistical power.

      We agree with the reviewer, and indeed we highlighted this as one of the main limitation of our work: “Even given the large sample in our study, we were underpowered to test for group comparisons between sets of infants with distinct undernutrition growth profiles, e.g., infants with early poor growth that later resolved and infants with standard growth early that had a poor growth later. We were also underpowered to test the associations between early growth and FC on clinically undernourished infants (defined as having DWLZ two standard deviations below the mean) (line 311, discussion section).

      We believe this is an important point to consider for the field, as it addresses the sample size required for studies investigating brain development in clinically malnourished infants. We hope this will serve as a valuable reference for future studies in the field. For example, a new study led by Prof. Sophie Moore and other members of the BRIGHT team (INDiGO) is currently recruiting six-hundreds pregnant women with the aim of obtaining a broader distribution of infants’ growth measures (https://www.kcl.ac.uk/research/sophie-moore-research-group).

      Reviewer #2 (Public Review):

      Summary and strengths:

      The article pertains to a topic of importance, specifically early life growth faltering, a marker of undernutrition, and how it influences brain functional connectivity and cognitive development. In addition, the data collection was laborious, and data preprocessing was quite rigorous to ensure data quality, utilizing cutting-edge preprocessing methods.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      However, the subsequent analysis and explanations were not very thorough, which made some results and conclusions less convincing. For example, corrections for multiple tests need to be consistently maintained; if the results do not survive multiple corrections, they should not be discussed as significant results. Additionally, alternative plans for analysis strategies could be worth exploring, e.g., using ΔFC in addition to FC at a certain age. Lastly, some analysis plans lacked a strong theoretical foundation, such as the relationship between functional connectivity (FC) between certain ROIs and the development of cognitive flexibility.

      Thus, as much as I admire the advanced analysis of connectivity that was conducted and the uniqueness of longitudinal fNIRS data from these samples (even the sheer effort to collect fNIRS longitudinally in a low-income country at such a scale!), I have reservations about the importance of this paper's contribution to the field in its present form. Major revisions are needed, in my opinion, to enhance the paper's quality. 

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings as well as hypothesis-generating findings that may not pass stringent significance thresholds. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      The relationship between FC and cognitive flexibility (as well as the relationship between growth and FC) has been explored focusing on those FC that showed a significant change with age, as specified in the results sections: ‘To investigate the impact of early nutritional status on FC at 24 months, we used multiple regression with the infant growth trajectory [...] and FC at 24 months [...]. To maximise power, we considered only those FC that showed a statistically significant change with age’ (line 183) and ‘To investigate whether FC early in life predicted cognitive flexibility at preschool age, we used multiple regression of FC across the first two years of life against later cognitive flexibility in preschoolers at three and five years. As per the analysis above, we focused on only those FC that showed a statistically significant change with age’ (line 198).

      We explored the possibility of investigating the relationship between changes in FC and changes in growth. However, the degrees of freedom in these analyses dropped dramatically (~25/30), thereby putting the significance and the meaning of the results at risk. We look forward to future longitudinal studies with less attrition across these time points to maintain the statistical power necessary to run such analyses.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility at 4-5 years of age, but results did not survive corrections for multiple comparisons.

      Strengths:

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little-studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - Analyzing such a huge amount of collected data at several ages is not an easy task to test developmental relationships between growth, FC, and behavioral capacities. In its present form, this study and the performed analyses lack clarity, unity and perhaps modeling, as it suggests that all possible associations were tested in an exploratory way without clear mechanistic hypotheses. Would it be possible to specify some hypotheses to reduce the number of tests performed? In particular, considering metrics at specific ages or changes in the metrics with age might allow us to test different hypotheses: the authors might clarify what they expect specifically for growth-FC-behaviour associations. Since some FC measures and changes might be related to one another, would it be reasonable to consider a dimensionality reduction approach (e.g., ICA) to select a few components for further correlation analyses?

      We confirm that this work was motivated by a compelling theoretical question: whether neural mechanisms, specifically FC, can be influenced by early adversity, such as growth, and subsequently impact cognitive outcomes, such as cognitive flexibility. This aligns with the overarching goal of the BRIGHT project, established in 2015 (Lloyd-Fox, 2023). We believe this was evident throughout the manuscript in several instances, for example:

      - “The goal of the study was to investigate early physical growth in infancy, developmental trajectories of brain FC across the first two years of life, and cognitive outcome at school age in a longitudinal cohort of infants and children from rural Gambia, an environment with high rates of maternal and child undernutrition. Specifically, we aimed to: (i) investigate whether differences in physical growth through the first two years of life are related to FC at 24 months, and (ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children.” (page 4, introduction)

      - “This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age.” (page 6, discussion)

      - We had a clear hypothesis regarding short-range connectivity decreasing with age and long-range connectivity increasing with age, as stated at the end of the introduction: We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (page 4, line 147). However, we were not able to formulate clear hypotheses about the localization of these connections due to the scarcity of previous studies conducted within this age range, particularly in low-resource settings. The ROI approach for analysis was chosen to mitigate this challenge by reducing the number of comparisons while still enabling us to estimate the developmental trajectories of all the connections from which we acquired data.

      Regarding the use of dimensionality reduction approach, we have not considered the use of ICA in our analysis. These methods require selecting a fixed number of components to remove from all participants. However, due to the high variability of infant fNIRS data across the five timepoints, we considered it untenable to precisely determine the number of components to remove at the group level. Such a procedure carries the risk of over-cleaning the data for some participants while leaving noise in for others (Di Lorenzo, 2019). We also felt that using PCA in this initial study would be beyond the scope of the brain-region-specific hypotheses and would be more appropriate in a follow-up analysis of these important data.

      References:

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      Di Lorenzo, R., Pirazzoli, L., Blasi, A., Bulgarelli, C., Hakuno, Y., Minagawa, Y., & Brigadoi, S. (2019). Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems. NeuroImage, 200(April), 511–527.

      - It seems that neurodevelopmental trajectories over the whole period (5-24 months) are little investigated, and considering more robust statistical analyses would be an important aspect to strengthen the results. The discussion mentions the potential use of structural equation modelling analyses, which would be a relevant way to better describe such complex data.

      We appreciate the complexity of the dataset we are working with, which includes multiple measures and time points. Currently, our focus within the outputs from the BRIGHT project is on examining the relationship between selected measures. While this may not involve statistically advanced modelling at the moment, it is worth noting that most of the results presented in this work have survived correction for multiple comparisons, indicating their statistical robustness. We believe that more advanced statistical analyses are beyond the scope of this rich initial study. In the next phase of the project, known as BRIGHT IMPACT, our team is collaborating with statisticians and experts in statistical modelling to apply more sophisticated and advanced statistical techniques to the data.

      - Given the number of analyses performed, only describing results that survive correction for multiple comparisons is required. Unifying the correction approach (FDR / Bonferroni) is also recommended. For the association between cognitive flexibility and FC, results are not significant, and one might wonder why FC at specific ages was considered rather than the change in FC with age. One of the relevant questions of such a study would be whether early growth and later cognitive flexibility are related through FC development, but testing this would require a mediation analysis that was not performed.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      We did not perform a mediation analysis as i) ΔWLZ between birth and the subsequent time points positively predicted frontal interhemispheric FC at 24 months, ii) frontal interhemispheric FC at 18 months (and right fronto-posterior connectivity at 24 months) predicted cognitive flexibility at preschool age. Considering that the frontal interhemispheric FC at 24 months that was positively predicted by growth, did not significantly predicted cognitive outcome at preschool age, we did not perform mediation models.

      The reviewer raised concerns about using different methods to correct for multiple comparisons throughout the work. Results showing changes in FC with age were Bonferroni corrected, while we used FDR correction for the regression analyses investigating the relationship between growth and FC, as well as FC and cognitive flexibility. Both methods have good control over Type I errors (false positives), but Bonferroni is very conservative, increasing the likelihood of Type II errors (false negatives). We considered Bonferroni an appropriate method for correcting results showing changes in FC with age, where we had a large sample with strong statistical power (i.e. linear mixed models with 132 participants who had at least 250 seconds of good data for 2 out of 5 visits). However, Bonferroni was too conservative for the regression analyses, with N between 57 and 78) (Acharya, 2014; Félix & Menezes, 2018; Narkevich et al., 2020; Narum, 2006; Olejnik et al., 1997).

      References:

      Acharya, A. (2014). A Complete Review of Controlling the FDR in a Multiple Comparison Problem Framework--The Benjamini-Hochberg Algorithm. ArXiv Preprint ArXiv:1406.7117.

      Félix, V. B., & Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1), 74–91.

      Narkevich, A. N., Vinogradov, K. A., & Grjibovski, A. M. (2020). Multiple comparisons in biomedical research: the problem and its solutions. Ekologiya Cheloveka (Human Ecology), 27(10), 55–64.

      Narum, S. R. (2006). Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783–787.

      Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22(4), 389–406.

      - Growth is measured at different ages through different metrics. Justifying the use of weight-for-length z-scores would be welcome since weight-for-age z-scores might be a better marker of growth and possible undernutrition (this impacting potentially both weight and length). Showing the distributions of these z-scores at different ages would allow the reader to estimate the growth variability across infants.

      We consistently used WLZ as the metric to measure growth throughout. Our analysis investigating the relationship between WLZ and growth included HCZ at 7/14 days to correct for head size at birth. When selecting the best growth measure for this paper, we opted for WLZ over WAZ, given extant evidence that infants in our sample are smaller and shorter compared to the reference WHO standard for the same age group (Nabwera et al., 2017). Therefore, using WLZ allows us to adjust each infant's weight for its own length.

      References:

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      - Regarding FC, clarifications about the long-range vs short-range connections would be welcome, as well as drawing a summary of what is expected in terms of FC "typical" trajectory, for the different brain regions and connections, as a marker of typical development. For instance, the authors suggest that an increase in long-range connectivity vs a decrease in short-range is expected based on previous fNIRS studies. However anatomical studies of white matter growth and maturation would suggest the reverse pattern (short-range connections developing mostly after birth, contrarily to long-range connections prenatally).

      We expected an increase in long-range functional connectivity with age, as discussed in the introduction:

      - “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). This maturation in FC has been shown to be related to the cascading maturation of myelination and synaptogenesis (32, 33) - fundamental processes for healthy brain development (34)” (line 93, page 3, introduction);

      - “Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 103, page 3, introduction);

      - “We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (line 147, page 4, introduction).

      Since inferences about FC patterns recorded with fNIRS are highly limited by the number and locations of the optodes, it is challenging to make strong inferences about specific brain regions. Moreover, infant FC fNIRS studies are still limited, which is why we focused our inferences on long-range versus short-range connectivity, without specifically pinpointing particular brain regions.

      Additionally, were unable to locate the works mentioned by the reviewer regarding an increase in short-range white matter connectivity immediately after birth. On the contrary, we found several studies documenting an increase in white-matter long-range connectivity after birth, which is consistent with the hypothesised increase in FC long-range connectivity, such as:

      Yap, P. T., Fan, Y., Chen, Y., Gilmore, J. H., Lin, W., & Shen, D. (2011). Development trends of white matter connectivity in the first years of life. PloS one, 6(9), e24678.

      Dubois, J., Dehaene-Lambertz, G., Kulikova, S., Poupon, C., Hüppi, P. S., & Hertz-Pannier, L. (2014). The early development of brain white matter: a review of imaging studies in fetuses, newborns and infants. Neuroscience, 276, 48-71.

      Stephens, R. L., Langworthy, B. W., Short, S. J., Girault, J. B., Styner, M. A., & Gilmore, J. H. (2020). White matter development from birth to 6 years of age: a longitudinal study. Cerebral Cortex, 30(12), 6152-6168.

      Hagmann, P., Sporns, O., Madan, N., Cammoun, L., Pienaar, R., Wedeen, V. J., ... & Grant, P. E. (2010). White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44), 19067-19072.

      Collin G, van den Heuvel MP. The ontogeny of the human connectome: development and dynamic changes of brain connectivity across the life span. Neuroscientist. 2013 Dec;19(6):616-28. doi: 10.1177/1073858413503712.

      The authors test associations between FC and growth, but making sense of such modulation results is difficult without a clearer view of developmental changes per se (e.g., what does an early negative FC mean? Is it an increase in FC when the value gets close to 0? In particular, at 24m, it seems that most FC values are not significantly different from 0, Figure 2B). Observing positive vs negative association effects depending on age is quite puzzling. It is also questionable, for some correlation analyses with cognitive flexibility, to focus on FC that changes with age but to consider FC at a given age.

      We thank the reviewer for bringing up this important point and understand that it requires some additional consideration. The negative FC values decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age. The trajectory seems to suggest that this will keep increasing with age but of course further data need to be collected to assess this.

      Unfortunately, when considering ΔFC to predict cognitive flexibility, the numbers of participants dropped significantly, with N=~15/20 infants per group of preschoolers, making it very challenging to interpret the results with meaningful statistical power.

      - The manuscript uses inappropriate terms "to predict", "prediction" whereas the conducted analyses are not prediction analyses but correlational.

      We thank the reviewer for giving us to opportunity to thoroughly revise the manuscript about this matter. In this work, we had clear hypotheses regarding which variables predicted which certain measures (such as growth predicting FC and FC predicting cognitive outcomes). Therefore, we performed regression analyses rather than correlational analyses to investigate these associations. Hence, we believe that using the term ‘predict and ‘prediction’ is appropriate

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the introduction and discussion, the authors talk about the link between developmental trajectories and cognitive capacities, and undernutrition. However, they did not compare developmental trajectories but connectivity patterns at different ages with ΔWLZ and cognitive flexibility. I recommend that the authors rephrase the introduction and discussion.

      We thank the reviewer for pointing out places requiring better clarity in the text. We made edits through the introduction to better match our investigations. In particular we changed:

      - ‘our understanding of the relationships between early undernutrition, developmental trajectories of brain connectivity, and later cognitive outcomes is still very limited,’ to, ‘our understanding of the relationships between early undernutrition, brain connectivity, and later cognitive outcomes is still very limited’ (line 89, introduction);

      - ‘(ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children,’ to, ‘(ii) investigate if early FC has an impact on cognitive outcome at pre-school age in these children’ (line 137, introduction);

      - ‘This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age,’ to, ‘This study investigated how early adversity via undernutrition drives brain functional connectivity throughout the first two years of life and how these early functional connections are associated with cognitive flexibility at preschool age’ (line 215, discussion).

      (2) Considering most research is done in occidental high-income countries, and this work is one of the few presenting research in another context, I think the authors should discuss in the manuscript that differences with previous studies might also be due to environmental and cultural differences. Since the study lacks the statistical power to perform a statistical analysis that directly establishes a link between developmental trajectories and restricted growth and cognitive flexibility, the authors cannot disentangle which differences are related to undernutrition and which might result from growing up in a different environment. I recommend that the authors avoid phrases like (lines 57-58): "We observed that early physical growth before the fifth month of life drove optimal developmental trajectories of FC..." or (lines 223-224) "...our cohort of Gambian infants exhibit atypical developmental trajectories of functional connectivity...".

      We thank the reviewer for this observation, and we agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to explore this further” (line 238). We revised the whole manuscript to reflect similar statements.

      (3) To better interpret the results, it would be interesting to know if poor early growth predicts late cognitive flexibility in the tested sample and if the ΔWLZ distributions differ compared to a population in a high-income country where undernutrition is less frequent.

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler group, but there were no significant associations.

      Mean and SD values of WLZ are reported in Table 3. The values at every age are negative, indicating that the infants' weight-for-length is below the expected norm at all ages. To our knowledge, no other studies have assessed changes in growth in an infant sample with similar closely spaced age time points in high-income countries, making comparisons on growth changes challenging.

      (4) It is unclear why WLZ at birth and HCZ at 7-14 days are included in the models. I imagine this is to ensure that differences are not due to growing restrictions before birth. It would be nice if the authors could explain this.

      As the reviewer pointed out, HCZ at 7-14 days was included to ensure associations between growth and FC are not due to physical differences at birth. This case be considered as a 'baseline' measure for cerebral development, in the same way that WLZ at birth was used as a baseline for physical development. Therefore, we can more confidently  assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth. We specified this in the manuscript as follows: “These analyses were adjusted by WLZ at birth and HCZ at 7/14 days, to more confidently assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth” (line 520, statistical analysis section in the method section).

      (5) Right frontal-posterior connections at 24 months negatively correlate with ΔWLZ. Thus, restricted growth results in stronger frontal-posterior connections at 24 months. However, the same connections at 24 months positively correlate with cognitive flexibility (stronger connections predict better cognitive flexibility). Do the authors have any interpretation of this? How could this relate to previous findings of the authors (Bulgarelli et al. 2020), showing first an increase and then a decrease in functional connectivity between frontal and parietal regions?

      We acknowledge that interpreting the negative relationship between changes in growth and fronto-posterior FC at 24 months, alongside the positive association between the same connection and later cognitive flexibility, is challenging. We refrain from relating these findings to those published by Bulgarelli in 2020 due to differences in optode locations and because in that work the decrease in fronto-posterior FC was observed after 24 months (up to 36 months), whereas the endpoint in this study is right at 24 months.

      (6) With the growth of the head, the frontal channels move to more temporal areas, right? Could this determine the decrease in frontal inter-hemisphere connections?

      As shown in Nabwera (2017) head size does not increase that much in Gambian infants, or at least as expected by the WHO standard measures. We have added HCZ mean and SD values per age in Table 3.

      Minor points

      - HCZ is used in line 184 but not defined.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Table SI2: NIRS not undertaken = the participant was assessed but did want or could not perform... I imagine there is a missing "not".

      We thank the reviewer for spotting this, we have now modified the legend of Table SI2 as follows: ‘the participant was assessed but did not want or could not perform the NIRS assessments.’

      - The authors should explain what weight-for-length is for those who are not familiar with it.

      We have added an explanation of weight-for-length in the experimental design section, line 339 as follows: ‘We then tested for relationships between brain FC at age 24 months with measures of early growth, as indexed by changes in weight-for-length z-scores (reflecting body weight in proportion to attained growth in length) at one month of age, and at each of the four subsequent visits (details provided below).’

      Reviewer #2 (Recommendations For The Authors):

      (1) I am confused about the authors' interpretation that left and right front-middle and right front-back FC increased with age. It appears in Figure 2 that the negative FC among these ROIs should actually decrease with age. This means that as individuals grow older, the FC values between these regions and zero diminished, albeit starting with negative FC (anticorrelation values) in younger age groups.

      Yes, the reviewer is correct. The negative values of the left and right front-middle and right front-back FC decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age.

      (2) Are these negative values mentioned above at 24 months still negative? Have t-tests been run to examine the differences from zero?

      As suggested, we performed t-tests against zero for the mentioned FC at 24 months, and only the left and right fronto-middle FC are significantly different than zero (left fronto-middle FC: t(94) = 1.8, p = 0.036; right fronto-middle FC t(94) = 2.7, p = 0.003).

      (3) With so many correlation analyses, have multiple comparisons been consistently controlled for? While I assume this was done according to the Methods section, could the authors clarify whether FDR adjustment was applied to all the p-values at once or to a group of p-values each time? I found the following way of reporting FDR-adjusted p-values quite informative, such as PFDR, 24 pairs of ROIs < 0.05.

      We thank the reviewer for this insightful comment. P-values of regression analyses were FDR corrected per connection investigated, i.e. 21 possible ΔWLZ values per connection. We have specified this in the method section as follows: “To ensure statistical reliability, results from the regression analyses on each FC were corrected for multiple comparisons using false discovery rate (FDR)(Benjamini & Hochberg, 1995) per each connection investigated, i.e. 21 possible ΔWLZ values per each connection,” (page 12, Statistical Analyses section).

      (4) Can early growth trajectories predict changes in FC? Why not use ΔWLZ to predict ΔFC?

      Unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing multiple measures.

      (5) I might have missed the rationale, but why weren't the growth changes after 5 months studied?

      ΔWLZ between all time points were assessed as predictors of FC at 24 months. We have specified this at line 183 as follows: ‘we used multiple regression with the infant growth trajectory (delta weight for length z-score between all time points, DWLZ) and FC at 24 months’. As indicated in Table 2 and 3 the associations between ΔWLZ at all time points and FC at 24 months were tested, but only those with DWLZ calculated between birth and 1 month and the subsequent time points were significant. DWLZ between 5 months and the subsequent time points, DWLZ between 8 months and the subsequent time points, DWLZ between 12 months and the subsequent time points, DWLZ between 18 months and the subsequent time points did not significantly predict FC at 24 months. These are highlighted in Table 2 and Figure 3 in blue and marked as NS (non-significant).

      (6) Once more, the advantage of longitudinal data is that it allows us to tap into developmental changes. Analyzing and predicting cognitive development based solely on FC values at a single age stage (i.e., 24 months) would overlook the benefits of a longitudinal design, which is regrettable. I suggest that the authors attempt to use ΔFC for predictions and observe the outcomes.

      As mentioned to point (4) raised by the reviewer, unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing various measures.

      (7) In the section "Early FC predicts cognitive flexibility at preschool age", the authors pointed out that "...,none of these survived FDR correction for multiple comparisons." However, the paper discussed the association between FC at 24 months of age and cognitive flexibility, as it was supported by the statistical analysis in the following sections. If FDR correction cannot be satisfied, I would rephrase the implication/conclusion of the results to suggest that early FC does not predict cognitive flexibility at preschool age.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings, even those not passing multiple comparisons corrections, as they may motivate hypothesis-generation for future studies. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further support these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: ‘While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      (8) Have the authors assessed the impact of growth trajectories on cognitive flexibility?

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler groups, but there were no significant associations.

      (9) Are there no other cognitive or behavioural measures available? Cognitive flexibility is just one domain of cognitive development, and would the impact of undernutrition on cognitive development be domain-specific? There is a lack of theoretical support here. Why choose cognitive flexibility, and should the impact of undernutrition be domain-specific or domain-general?

      We agree with the reviewer that in this work, we chose to focus on one specific cognitive outcome. While this does not imply that the impact of undernutrition is domain-specific, cognitive flexibility, being a core executive function, has been extensively studied in terms of its neural underpinnings using other neuroimaging modalities, especially fMRI (for example see Dajani, 2015; Uddin, 2021).

      Moreover, other studies looking at the effect of adversity on cognitive outcomes focus on specific cognitive skills, such as working memory (Roberts, 2017), reading and arithmetic skills (Soni, 2021).

      We did assess infants also with Mullen Scales of Early Learning (MSEL), although the cognitive flexibility task within the Early Years Toolbox has been specifically designed for preschoolers (Howard, 2015), and this set of tasks has recently been validated in our team in The Gambia (Milosavljevic, 2023).Future works from the BRIGHT team will investigate performance at the MSEL in relation to other variable of the project.

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      L. Q. Uddin, Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

      Roberts, S. B., Franceschini, M. A., Krauss, A., Lin, P. Y., de Sa, A. B., Có, R., ... & Muentener, P. (2017). A pilot randomized controlled trial of a new supplementary food designed to enhance cognitive performance during prevention and treatment of malnutrition in childhood. Current developments in nutrition, 1(11), e000885.

      Soni, A., Fahey, N., Bhutta, Z. A., Li, W., Frazier, J. A., Moore Simas, T., ... & Allison, J. J. (2021). Early childhood undernutrition, preadolescent physical growth, and cognitive achievement in India: A population-based cohort study. PLoS Medicine, 18(10), e1003838.

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Milosavljevic, B., Cook, C. J., Fadera, T., Ghillia, G., Howard, S. J., Makaula, H., ... & Lloyd‐Fox, S. (2023). Executive functioning skills and their environmental predictors among pre‐school aged children in South Africa and The Gambia. Developmental Science, e13407.

      (10) I would review more previous fNIRS studies on infants if they exist (e.g., the work by S Lloyd-Fox, L Emberson, and others). These studies can help identify brain ROIs likely linked to undernutrition and cognitive flexibility. The current analysis methods lean towards exploratory research. This makes the paper more of a proof-of-concept report rather than a strongly theoretically-driven study.

      We thank the reviewer for this important point. While we have reviewed existing fNIRS infant studies, there are no extant works that showed whether specific brain regions are related undernutrition. However, several fMRI studies assessed regions that do support cognitive flexibility, and we mentioned these in the manuscript (for example see Dajani, 2015; Uddin, 2021).

      Other than the BRIGHT project, we are aware of two other projects that assessed the effect of undernutrition on brain development, assessing cognitive outcomes in poor-resource settings:

      - the BEAN project in Bangladesh in which fNIRS data were recorded from the bilateral temporal cortex (i.e. Pirazzoli, 2022);

      - a project in India in which fNIRS data were recorded from frontal, temporal and parietal cortex bilaterally (i.e. Delgado Reyes, 2020)

      The brain regions recorded in these studies largely overlap with the brain regions we recorded from in this study.

      Another aspect to consider is that infants underwent several fNIRS tasks as part of the BRIGHT project, focusing on social processing, deferred imitation, and habituation responses. Therefore, brain regions for data acquisition were chosen to maximize the likelihood of recording meaningful data for all tasks (Lloyd-Fox, 2023). To clarify the text, we specified this information in the methods section (line 383).

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      Pirazzoli, L., Sullivan, E., Xie, W., Richards, J. E., Bulgarelli, C., Lloyd-Fox, S., ... & Nelson III, C. A. (2022). Association of psychosocial adversity and social information processing in children raised in a low-resource setting: an fNIRS study. Developmental Cognitive Neuroscience, 56, 101125.

      Delgado Reyes, L., Wijeakumar, S., Magnotta, V. A., Forbes, S. H., & Spencer, J. P. (2020). The functional brain networks that underlie visual working memory in the first two years of life. NeuroImage, 219, Article 116971.

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      (11) Last but not least, in the paper, the authors mentioned that fNIRS offers better spatial resolution and anatomical specificity compared to EEG, thereby providing more precise and reliable localization of brain networks. While I partially agree with this perspective, it remains to be explored whether the current fNIRS analysis strategies indeed yield higher spatial resolution. It is hoped that the authors will delve deeper into this discussion in the paper.

      The brain regions of focus were selected based on coregistration work previously conducted at each time point on the array used in this project (Collins-Jones, 2019). We deliberately avoided making claims about small brain regions, considering that head size might increase slightly less with age in The Gambia compared to Western countries (Nabwera, 2017) . However, we maintain that the conclusions drawn in this study offer higher brain-region specificity than could have been  identified with current common EEG methods alone.

      References:

      L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021).

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      - Among important developmental mechanisms to mention are the development of exuberant connections and the further selection/stabilization of the relevant ones according to environmental stimulation, vs the pruning of others.

      We agree with the reviewer that the development of exuberant connections and subsequent pruning is a universal process of paramount importance during the first years of life. However, after revising our introduction, given the word limit of the journal, we maintained focus on neurodevelopment and early adversity.

      Results

      - Adding a few more information on the 6 sections and 21 connections would be welcome. In particular for within-section FC: how was this computed?

      The 6 sections were created based on the co-registration of the array used in this study at each age in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’.

      The 21 connections were defined as all the possible links between the 6 regions, specifically: the interhemispheric homotopic connections (in orange in Figure SI1), which connect the same regions between hemispheres (i.e., front left with front right); the intrahemispheric connections (in green in Figure SI1), which correlate channels belonging to the same region; the fronto-posterior connections (in blue in Figure SI1), which link front and middle, middle and back, and front and back regions of the same hemisphere; and the crossing interhemispheric connections (non-homotopic interhemispheric, in yellow in Figure SI1), which link the front, middle, and back areas between left and right hemispheres. We added these specifications also in the legend of Figure SI1 for clarity.

      - The denomination intrahemispheric vs fronto-posterior vs crossed connections is not clear. Maybe prefer intra-hemispheric vs inter-hemispheric homotopic vs inter-hemispheric non-homotopic (also in Figure SI1).

      We appreciate the reviewer's suggestion regarding terminology. However, we believe that the term 'inter-hemispheric non-homotopic' could potentially refer to both connections within the same brain hemisphere from front to back and connections crossing between hemispheres, leading to increased confusion. Therefore, we have chosen not to include the term 'non-homotopic' and instead added 'homotopic' to 'interhemispheric' throughout the manuscript to emphasize that these functional connections occur between corresponding regions of the two hemispheres.

      - with time -> with age.

      We replaced “with time” with “with age” as suggested through the manuscript.

      - The description of both HbO2 and HHb results overloads the main text: would it be relevant to present one of the two in Supplementary Information if the results are coherent?

      We understand the reviewer’s concern regarding overloading the results section with reporting both chromophores. However, reporting results for both HbO and HHb is considered a crucial step for publications in the fNIRS field, as emphasized in recent formal guidance (Yücel et al., 2020). One of the strengths of fNIRS compared to fMRI is its ability to record from both chromophores, enabling a more precise characterization of brain activations and oscillations. Moreover, in FC studies like this one, ensuring that HbO and HHb results overlap is an important check that increases confidence in interpreting the findings.

      References:

      Yücel, M. A., von Lühmann, A., Scholkmann, F., Gervain, J., Dan, I., Ayaz, H., Boas, D., Cooper, R. J., Culver, J., Elwell, C. E., Eggebrecht, A. ., Franceschini, M. A., Grova, C., Homae, F., Lesage, F., Obrig, H., Tachtsidis, I., Tak, S., Tong, Y., … Wolf, M. (2020). Best Practices for fNIRS publications. Neurophotonics, 1–34. https://doi.org/10.1117/1.NPh.8.1.012101

      - HCZ is not defined when first used.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Choosing the analyzed measures to "maximize power" could be criticised.

      We appreciate the reviewer’s concern. However, correlating all the FC values with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a priori decision to focus on investigating the relationship between changes in growth and those FC that showed a significant change with age, considering these as the most interesting ones from a developmental perspective in our sample.

      Discussion

      - I would recommend using the same order to synthesize results and further discuss them.

      We agree with the reviewer that the suggested structure is optimal for a clear discussion section. We have indeed followed it, with each paragraph covering specific aspects:

      - Recap of the study aims

      - Results summary and discussion of developmental changes

      - Results summary and discussion of the relationship between changes in growth and FC

      - Results summary and discussion of the relationship between FC and cognitive flexibility

      - Limitations

      - Conclusion

      Given the numerous results presented in this paper, we believe that readers will better digest them by first reading a summary of the results followed by their interpretations, rather than condensing all the interpretations together.

      - Highlighting how "atypical" developmental trajectories are in Gambian infants would be welcome in the Results section. Other interpretations can be found than "The observed decrease in frontal inter-hemispheric FC with increasing age may be due to the exposure to early life undernutrition adversity".

      We agree with the reviewer that other factors that differ between low- and high-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to further investigate cultural, environmental, and genetic effects on brain FC” (line 238).

      - Focusing on FC at 24m for the relationship with growth is questionable.

      Correlating the FC values at 5 time points with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a decision a priori to focus on investigating the relationship between changes in growth and FC at 24 months as our final time point of data collection. We added this information in the methods section as follows: “To investigate the impact of undernutrition on FC development, we used DWLZ as independent variables in regression analyses on HbO2 (as the chromophore with the highest signal-to-noise ratio) FC at 24 months, our final time point of data collection” (line 517, method section).

      - There is too much emphasis on the correlation between FC and cognitive flexibility, whereas results are not significant after correction for multiple comparisons.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      Methods

      - I would recommend detailing how z-scores were computed in the paragraph "Anthropometric measures".

      We specified how z-scores were computed in the statistical analysis section as follows: “Anthropometric measures were converted to age and sex adjusted z‐scores that are based on World Health Organization Child Growth Standards (93). Weight‐for‐Length (WLZ) and Head Circumference (HCZ) z-scores were computed” (line 509, method section). As transforming data is the first step of statistical analysis and is not directly related to data collection, we believe it is more appropriate to retain this description in the statistical analysis section.

      - FC computation: the mention of "correlating the first and the last 250s" is not clear.

      We specified this more clearly in the text as follows: We found that correlating the first and the last 250 seconds of valid data after pre-processing provided the highest percentage of infants with strong correlation between the first and the last portion of data (line 467).

      - The manuscript mentions "age 3 years" for the younger preschoolers but ~48months rather corresponds to 4 years.

      We revised the entire manuscript and the supplementary materials, but we could not find any instance in which preschoolers are referred with age in months rather than in years.

      - Specify the number of children evaluated at 4 and 5 years. Is the test of cognitive flexibility normalized for age? If not, how were the 2 groups considered in the analyses? (age as a confounding factor).

      We have added the number of children in the two preschooler groups as follows: younger preschoolers (age mean ± SD=47.96 ± 2.77 months, N=77) and older preschoolers (age mean ± SD=57.58 ± 2.11 months, N=84). (line 484).

      The cognitive flexibility test was not normalized for age, as this task was specifically developed for preschoolers (Howard, 2015). As mentioned in ‘Cognitive flexibility at preschool age’ of the methods section, “data were collected in two ranges of preschool ages”, which guided our decision to perform regression analysis on the impact of FC on cognitive flexibility separately within these two age groups, rather than treating them as a single group of preschoolers.

      References:

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Figures and Tables

      - Table 1 could highlight the significant results. It is not clear what the "baseline" results correspond to.

      We have marked in bold the results that are statistically significant in Table 1. In the linear mixed model we performed, the first time point (i.e. 5 months) is chosen as ‘baseline’, i.e. the reference against which the other time points are compared to, and its statistical values refer to its significance against 0 (as it has been performed in Bulgarelli 2020).

      - Figures 2 B and C seem redundant? What is SE vs SD?

      We believe that both figures 2B and 2C are useful for the readers. While the first one shows the mean FC values at the group level, the second one highlights the individual variability of FC values (typical of infant neuroimaging data), which also why it is interesting to relate these measures to other variables of our dataset (i.e. growth and cognitive flexibility). Figure 2C also reports mean FC values per age, but these might be less visible considering that also one dot per infant is also plotted.

      SE stands for standard error, and in the legend of the figure we specified this as follows: ‘Mean and standard error of the mean (SE)’. SD stands for standard deviation, and we have now specified this as follows: ‘mean ± standard deviation (SD)’ .

      - Table 2: I would recommend removing results that don't survive corrections for multiple comparisons.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      - Figure 3: the top is redundant with Table 2: to be merged? B: the statistical results might be shown in a Table.

      We agree with the reviewer that the top part of Figure 3 and Table 2 report the same results. However, given the richness of these findings, we believe that the top part of Figure 3 serves as a useful summary for readers. Additionally, examining both the top and bottom parts of Figure 3 provides a comprehensive overview of the regression analysis conducted in this study.

      - Figure SI6: Is it really a % in x-axis?

      We thank the reviewer for spotting this typo, the percentage is relevant for the y-axis only. We removed the % symbol from ticks of the x-axis.

      - Table SI1: the presented p-values don't seem to survive Bonferroni correction, contrary to what is written.

      We thank the reviewer for spotting this mistake, we removed the reference to the Bonferroni correction for the p-values.

      - Table SI2: For the proportion of children included in the analysis, maybe be precise that the proportion was computed based on the ones with acquired data. Maybe also add the proportion according to all children, to better show the high drop-out rate at certain ages?

      We thank the reviewer for these useful suggestions. We have specified in the legend of the table how we calculated the proportion of infants included as follows: ‘The proportion of children included in the analysis was computed based on the infants with FC data’. We have also added a column in the table called ‘Inclusion rate (from the 204 infants recruited)’, following the reviewer’s suggestion. This will be a useful reference for future studies.

      - A few typos should be corrected throughout the manuscript.

      We thoroughly revised the main manuscript and the supplementary materials for typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds.

      They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      We thank the reviewer for the close reading of the manuscript and the many constructive comments and critiques. As the reviewer notes, there have been many prior studies of related circuits in other sensorimotor systems forming an important context for our study and findings, as we have tried to highlight. We appreciate the suggestions for additional relevant articles to cite.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      We have analyzed the activity patterns as a function of cortical depth, and now include these results in the manuscript as suggested. The key new finding is that the M1 responses are strongest in upper layers, consistent with expectations based on the excitatory corticocortical synaptic connectivity characterized previously. Changes to the manuscript include new figures (Figure 5; Figure 5 - figure supplement 1), which we explain (Methods: page 14, lines 618-621), describe (new Results section: pages 4-5, lines 183-189), comment on (Discussion: page 9, lines 378-391), and summarize the significance of (Abstract: page 1, lines 22-24). In addition, we incorporated the new laminar analysis into a summary schematic (Figure 9). We thank the reviewer for suggesting this analysis.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      These experiments were relatively complex and M1 optotagging was not routinely included in the stimulus and acquisition protocol. Therefore, we don’t have sufficient data for this analysis. We plan to address this in future studies.

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback.

      We agree that this is of interest but consider this to be outside the scope of the current study.

      I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement?

      We have performed additional experiments to address this point. A constraint with EMG is that it is limited to the muscle(s) one chooses to record from, and it is difficult to implant tiny muscles of the hand. Therefore, for this analysis, we used kilohertz videography as a high-sensitivity method for movement surveillance across the hand. Hand stimulation did not evoke any detectable movements. Changes in the manuscript include: revised Figure 1 - figure supplement 1; supplementary Figure 1 - video 1; and associated text edits in the Methods (page 13, line 557; page 14, lines 626-639) and Results sections (page 2, lines 84-85).

      A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

      As we now say in the Methods > Optogenetic photostimulation of the hand section (page 13, lines 562-565), “This intensity was chosen based on pilot experiments in which we varied the LED power, which showed that this intensity was reliably above the threshold for evoking robust responses in both S1 and M1 without evoking any visually detectable movements (as subsequently confirmed by videography)”.

      Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Thank you for pointing this out. The prior studies suggest it is mainly a subset of layer 5B excitatory neurons that may express PV. We checked this in two ways. Anatomically, we did not find double-labeling. An electrophysiology assay showed that, although some evoked excitatory synaptic input could be detected in some neurons, these inputs were very weak. Results from these assays are shown in new Figure 6 - figure supplement 1, with associated text edits in the Methods (page 11, lines 469-471; page 15, lines 657-668) and Results (page 5, lines 198-199) sections.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      As noted above, we have performed additional experiments to address this.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      Thank you for pointing this out; we now cite this article (page 1, line 46; page 10, line 415).

      Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulselike volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:

      (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;

      (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;

      (3) PV interneurons carry the major component of spike modulation for each of these phases; (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;

      (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      As noted above, we have performed additional experiments to address this point.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      Thank you for noting this. Although the absolute firing rates are not essential for the main findings or conclusions (which as noted focus on response modulations and relative differences) we agree that analyzing the single-unit response amplitudes is useful. Therefore, changes in the manuscript now include: revised Figure 3, and associated text edits in the Methods (page 12, lines 543-545), Results (page 3, lines 115-119), and Discussion (page 7, lines 305-311) sections.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      Yes, this reflects the binning. We agree that this is potentially confusing and have removed these average plots below the raster plots, as the rasters alone suffice to demonstrate the result (i.e., that PV units are strongly activated and thus tagged by optogenetic stimulation). Changes are now reflected in revised Figure 6.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

      In the figure plot legends, the “W.” has been removed. Changes are now reflected in revised Figure 7 and Figure 7 – figure supplement 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Did you filter the neural signals during acquisition? If so, please include these details in the results.

      Signals were bandpass-filtered (2.5 Hz to 7.6 KHz) at the hardware level at acquisition (with no additional software filtering applied), as now clarified in the Methods Electrophysiological recordings section as requested (page 12, lines: 525-526).

      Reviewer #2 (Recommendations for the authors):

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Please see above for our response to this issue.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      Please see above for our response to this issue.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      As noted above, we now cite this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca<sup>2+</sup> channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of the most critical concerns was a possible involvement of Ca<sup>2+</sup> channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have measured total (free plus buffered) calcium increments induced by each of first four APs in 40 Hz trains at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca<sup>2+</sup> increments were not different from one another, arguing against possible contribution of Ca<sup>2+</sup> channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (p<sub>v</sub>) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of p<sub>v</sub> in the present study is the release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      We below described our point-by-point replies to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory. Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      (1) While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in release probability as an alternative.

      Quantal content (m) depends on n * p<sub>v</sub>, where n = RRP size and p<sub>v</sub> =vesicular release probability. The value for p<sub>v</sub> critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRP<sub>hyper</sub>) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured p<sub>v</sub> based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, p<sub>v</sub> in the present study denotes vesicular fusion probability of TS vesicles, not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead, the occupancy (p<sub>occ</sub>) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k<sub>1</sub> and b<sub>1</sub>, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and p<sub>occ</sub>, among which N is a fixed parameter but p<sub>occ</sub> depends on k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca<sup>2+</sup> (Hosoi, et al., 2008), p<sub>occ</sub> is not a fixed parameter. Therefore, release probability should be re-defined as p<sub>occ</sub> * p<sub>v</sub>. Given that N is fixed, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in p<sub>occ</sub> rather than p<sub>v</sub>, because p<sub>v</sub> is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in p<sub>v</sub> of reluctant vesicles.

      We imagine that the Reviewer meant vesicular release or fusion probability (p<sub>v</sub>) by ‘release probability’. If so, p<sub>v</sub> (of TS vesicles) cannot be a major player in STF, because the baseline p<sub>v</sub> is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that p<sub>v</sub> is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca<sup>2+</sup>-dependent step increase in p<sub>v</sub> of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that p<sub>v</sub> of TS vesicles is close to one, an increase in p<sub>v</sub> of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as p<sub>v,LS</sub>) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that p<sub>v,LS</sub> is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in p<sub>v v,LS </sub> that reside in a distinct pool. Because the increase in p<sub>v,LS</sub> during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in p<sub>v,LS</sub> that occurs in parallel with STF. Strong PPD, indicative of high p<sub>v</sub>, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in p<sub>v,LS</sub>. One may argue that STF may be mediated by a drastic step increase of p<sub>v,LS</sub> from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we incorporated these perspectives into Discussion and further clarified the reasoning behind our conclusions.

      References

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Please note that papers cited in the manuscript are not repeated here.

      (2) Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into p<sub>v</sub> and p<sub>occ</sub>, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= p<sub>occ</sub> x p<sub>v</sub>). Our novel finding is that the increase in release probability should be attributed to an increase in p<sub>occ</sub>, not to that in p<sub>v</sub>.

      (3) Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline p<sub>v</sub> is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline p<sub>occ</sub> seems to be increased by OAG.

      Reference

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      (4) The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial p<sub>v</sub>, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial p<sub>v</sub>. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial p<sub>v</sub> has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial p<sub>v</sub>, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial p<sub>v</sub> already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial p<sub>v</sub>. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on p<sub>v</sub>.

      References

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      (1) The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We added an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the p<sub>v</sub> in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, p<sub>occ</sub> in the present study is the occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at p<sub>v</sub> close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high p<sub>v</sub>, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that p<sub>v</sub> is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in p<sub>occ</sub> (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high p<sub>v</sub> too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low p<sub>v</sub> or for a gradual increase in p<sub>v</sub> of reluctant vesicles during short-term facilitation.  

      Following statement was added to Discussion in the revised manuscript

      “Previous studies suggested that an increase in p<sub>v</sub> is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRP<sub>hyper</sub>’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high p<sub>v</sub> based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, p<sub>v</sub> in the present study indicates the fusion probability of TS vesicles. From the same reasons, p<sub>occ</sub> denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in p<sub>occ</sub> (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRP<sub>hyper</sub>.”

      (2) Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca<sup>2+</sup> channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS, https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca<sup>2+</sup> channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca<sup>2+</sup> channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca<sup>2+</sup>-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca<sup>2+</sup>-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca<sup>2+</sup> to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca<sup>2+</sup> (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca<sup>2+</sup> channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca<sup>2+</sup> channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k<sub>1</sub> and b<sub>1</sub>, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca<sup>2+</sup> channel inactivation. Nevertheless, we acknowledge the possibility that Ca<sup>2+</sup> channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca<sup>2+</sup> transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca<sup>2+</sup> channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca<sup>2+</sup> channel inactivation is unlikely to contribute to the pronounced PPD.

      Figure 2—figure supplement 2 shows how we measured the total Ca<sup>2+</sup> increments at axonal boutons. First we estimated endogenous Ca<sup>2+</sup>-binding ratio from analyses of single AP-induced Ca<sup>2+</sup> transients at different concentrations of Ca<sup>2+</sup> indicator dye (panels A to E). And then, using the Ca<sup>2+</sup> buffer properties, we converted free [Ca<sup>2+</sup>] amplitudes to total calcium increments for the first four AP-evoked Ca<sup>2+</sup> transients in a 40 Hz train (panels G-I). We incorporated these results into the revised version of our manuscript to provide evidence against the Ca<sup>2+</sup> channel inactivation.

      (3) On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca<sup>2+</sup>-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in p<sub>fusion</sub> of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (p<sub>r</sub>) within a single active zone rather than a heterogeneity in p<sub>fusion</sub> of individual docked vesicles. Therefore both p<sub>occ</sub> and p<sub>v</sub> of TS vesicles would contribute to the p<sub>r</sub> distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high p<sub>v</sub> contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low p<sub>v</sub> vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we incorporated the perspectives into Discussion and further clarified the reasoning behind our conclusions.

      (4) In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca<sup>2+</sup> below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the Reviewer suggested, low external Ca<sup>2+</sup> concentration can lower release probability (p<sub>r</sub>). Given that both p<sub>v</sub> and p<sub>occ</sub> are regulated by [Ca<sup>2+</sup>]<sub>i</sub>, low external [Ca<sup>2+</sup>] may affect not only p<sub>v</sub> but also p<sub>occ</sub>, both of which would contribute to low p<sub>r</sub>. Under such conditions, it would be plausible that the baseline p<sub>r</sub> becomes much lower than 0.1 due to low p<sub>v</sub> and p<sub>occ</sub> (for instance, p<sub>v</sub> decreases from 1 to 0.5, and p<sub>occ</sub> from 0.3 to 0.1, then p<sub>r</sub> = 0.05), and then p<sub>r</sub> (= p<sub>v</sub> x p<sub>occ</sub>) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca<sup>2+</sup>] accumulates during a train.

      If p<sub>v</sub> is close to one, p<sub>r</sub> depends p<sub>occ</sub>, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca<sup>2+</sup> binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca<sup>2+</sup>-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca<sup>2+</sup>-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      Reference

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Please note that papers cited in the manuscript are not repeated here.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k<sub>1</sub> / (k<sub>1</sub> + b<sub>1</sub>) and (k<sub>1</sub> + b<sub>1</sub>), respectively. The baseline occupancy depends on k<sub>1</sub>/b<sub>1</sub>, while the PPR recovery on absolute values of k<sub>1</sub> and b<sub>1</sub>. Based on p<sub>occ</sub> and PPR recovery time constant of WT and KD synapses, we expect higher k<sub>1</sub>/b<sub>1</sub> but lower values for (k<sub>1</sub> + b<sub>1</sub>) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of the manuscript, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      Reference

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline p<sub>occ</sub> and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments:

      (1) While the authors claim that Syt7-mediated facilitation is connected to the behavioral deficits they observed, this link is still somewhat speculative. This manuscript could benefit from further discussions of other alternative mechanisms to consider.

      We added following statement to Discussion of the revised manuscript:

      “The acquisition of trace fear memory was impaired by inhibition of persistent activity in mPFC during trace period (Gilmartin et al., 2013). The similar deficit observed in Syt7 KD animals is consistent with the hypothesis that STF provides bi-stable ensemble activity in a recurrent network (Mongillo et al., 2012). Nevertheless, alternative mechanisms may be responsible for the behavioral deficit. Not only recurrent network but also long-range loop between the mPFC and the mediodorsal (MD) thalamus play a critical role in maintaining persistent activity within the mPFC especially for a delay period longer than 10 s (Bolkan et al., 2017). Prefrontal L2/3 is heavily innervated by MD thalamus, and L2/3-PCs subsequently relay signals to L5 cortico-thalamic (CT) neurons (Collins et al., 2018). Given that L2/3 is an essential component of the PFC-thalamic loop, loss of STF at recurrent synapses between L2/3 PCs may lead to insufficient L2/3 inputs to L5 CT neurons and failure in the reverberant PFC-MD thalamic feedback loop. Therefore, not only L2/3 recurrent network but also its output to downstream network should be considered as a possible network mechanism underlying behavioral deficit caused by Syt7 KD L2/3.”

      (2) The authors mention that Syt7 contributes to persistent activity during working memory tasks but focus on using only a trace fear conditioning task. However, it would be interesting to see if their results are generalizable to other working memory tasks (i.e. a delayed alternation task).

      We thank to Reviewer for the insightful suggestion. Trace fear conditioning (tFC) shares behavioral properties with working memory (WM) tasks in that tFC is vulnerable to attentional distraction and to the load of WM task. In general WM tasks including delayed alternation tasks such as a T-maze task need persistent activity of ensemble neurons representing target-specific information among multiple choices. Different from such WM tasks, tFC is not appropriate to examine target-specific ensemble activity. Because it is not trivial to examine in vivo recordings in KD animals during delayed alternation tasks, it will be appropriate to study the effect of Syt7 KD in a separate study. 

      (3) The figure legend in Figure 6A and 6B mentions dotted lines and broken lines in the figure. However, this is confusing, and it is unclear as to what these lines are referring to in the figure.

      To avoid the confusion in the figure legend for Figure 6A and 6B, we corrected “dotted line” to " vertical broken line", and “broken lines” to “dashed parabolas”.

      (4) The manuscript can benefit from close reading and editing to catch typos and improve general readability (i.e. line 173: the word "are" is repeated twice).

      We corrected typographical errors throughout the manuscript and carefully read the manuscript to improve readability. A revised version reflecting these corrections has been prepared and will be resubmitted for your consideration.

      Reviewer #3 (Recommendations for the authors):

      The points in this section are all minor.

      (1) Line 44: Define release probability (p_r) more clearly. Authors use it to mean p<sub>v</sub>*p<sub>occ</sub>, but others routinely use it to mean p<sub>v</sub>*p<sub>occ</sub>*N.

      We understand that the Reviewer meant “others routinely use it to mean p<sub>v</sub>”. At this statement, we meant conventional definition of release probability, which is release probability among vesicles of RRP. We think that it is not appropriate to re-define release probability as p<sub>v</sub> * p<sub>occ</sub> in this first paragraph of Introduction. Therefore we clarified this issue in Discussion as we mentioned in our reply to the 1st weakness issue raised by Reviewer #3.   

      (2) Line 82: For clarity, define better what recurrent excitatory synapses are. It seems that synapses between L2/3 PCs and local targets may all be recurrent?

      Each of L2/3 and L5 of the prefrontal cortical layers harbors intralaminar recurrent excitatory synapses between pyramidal cells, called a recurrent network. Previous theoretical studies have proposed that a single layer recurrent network model can have bi-stable E/I balanced states (up- and down-states) if recurrent excitatory synapses display short-term facilitation (STF), and thus is able to temporally hold an information once external input shifts the network to the up-state. In this theory, synapses to local targets across layers are not considered and specific roles of L2/3 and L5 in working memory tasks are still elusive. For clarity, we added a statement at the beginning of the paragraph (line 82): “Each of layer 2/3 (L2/3) and layer 5 (L5) of neocortex displays intralaminar excitatory synapses between pyramidal cells comprising a recurrent network (Holmgren et al., 2003; Thomson and Lamy, 2007)”

      (3) Cite earlier studies of short-term synaptic plasticity at synapses between L2/3 pyramidal neurons and local targets in mPFC. If there are none, take more explicit credit for being first.

      As we mentioned in Introduction, previous studies on short-term plasticity (STP) at neocortical excitatory recurrent synapses have focused on synapses between L5 pyramidal cells (PCs) (Hemple et al. 2000; Wang et al. 2006; Morishima et al., 2011; Yoon et al., 2020). The local connectivity between L2/3 PCs in the somatosensory cortex has been elucidated by Homgren et al. (2003) and Ko et al. (2011). Although these study showed STP of EPSPs, it was at a fixed frequency or stimulus pattern at high external [Ca<sup>2+</sup>] (2 mM). There is a study on the frequency-dependence of STP of EPSP between L2/3-PCs (Feldmyer et al., 2006). Different from our study, Feldmyer et al., (2006) observed monotonous STD at all frequencies less than 50 Hz, but this study was done in the somatosensory cortex and at high external [Ca<sup>2+</sup>] (2 mM). To our knowledge, no previous study have investigated STP at recurrent excitatory synapses of L2/3 pyramidal cells of the mPFC especially at physiological external [Ca<sup>2+</sup>]. The present study, therefore, represents the first extensive investigation of STP at recurrent excitatory synapses in L2/3 of the mPFC under physiologically relevant external [Ca<sup>2+</sup>].

      References

      Feldmeyer D, Lubke J, Silver RA, Sakmann B (2002) Synaptic connections between layer 4 spiny neurone-layer 2/3 pyramidal cell pairs in juvenile rat barrel cortex: physiology and anatomy of interlaminar signalling within a cortical column. J Physiol 538:803-822.

      Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. J Physiol 551:139-153.

      Ko H, Hofer SB, Pichler B, Buchanan KA, Sjöström PJ, Mrsic-Flogel TD (2011) Functional specificity of local synaptic connections in neocortical networks. Nature 473:87-91.

      Morishima M, Morita K, Kubota Y, Kawaguchi Y (2011) Highly differentiated projection-specific cortical subnetworks. Journal of Neuroscience 31:10380-10391.

      Wang Y, Markram H, Goodman PH, Berger TK, Ma J, Goldman-Rakic PS (2006) Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat Neurosci 9:534-542.

      (4) I couldn't figure out the significance of Figure S3. Perhaps this could be explained better.

      Optical minimal stimulation methods have not been previously documented in detail. This figure illustrates what parameters we should carefully examine in order to attain optical minimal stimulation, which hopefully stimulates a single afferent fiber. A single fiber stimulation by optical minimal stimulation is supported by the similarity of our estimate for the number of release sites (N) as the previous morphological estimate (Holler et al., 2021). For minimal stimulation, we used a collimated DMD-coupled LED was employed to restrict 470 nm illumination to a small and well-defined region within layer 2/3 of the prelimbic mPFC, and carefully adjusted the illumination radius such that one step smaller (by 1 μm) illumination results in failure to evoke EPSCs. Our typical illumination area ranged between 3–4 μm, as shown in Figure S3A. Under this minimal illumination area, we confirmed unimodal distributions for the EPSC parameters (amplitude, rise time, decay time and time to peak; Figure 3B-E). Otherwise, we excluded the recordings from analysis. We hope this explanation provides a clearer understanding of the figure's significance.

      (5) Note that CTZ seems to alter p_r at some synapses.

      We acknowledge that CTZ can increase release probability by blocking presynaptic K<sup>+</sup> currents. Indeed, Ishikawa and Takahashi (2001) reported that CTZ slowed the repolarizing phase of presynaptic action potentials and the frequency of miniature EPSCs in the calyx synapses. Consistently, we observed a slight increase in the baseline EPSC amplitude, from 33.3 pA to 41.9 pA (p=0.045) following the application of 50 µM CTZ. However, given that vesicular release probability (p<sub>v</sub>) is already close to 1 at the synapse of our interest, we believe that the observed effect is more likely attributed to an increase in release sites occupancy (p<sub>occ</sub>), which would be reflected as an increase in miniature EPSC frequency in Ishikawa and Takahashi (2001). Given that PPR depends on p<sub>v</sub> rather than p<sub>occ</sub>, this increase in p<sub>occ</sub> would not critically change our conclusion that AMPA receptor desensitization is not responsible for the strong PPD.

      Reference

      Ishikawa, T., & Takahashi, T. (2001). Mechanisms underlying presynaptic facilitatory effect of cyclothiazide at the calyx of Held of juvenile rats. The Journal of Physiology, 533(2), 423-431.

      (6) Figure 8B. The result in Figure 8C seems important, but I couldn't figure out why behaviour was not altered during the acquisition phase summarized in Figure 8B. Perhaps this could be explained more clearly for non-experts.

      Little difference in freezing behavior during acquisition has been also observed when prelimbic persistent firing was optogenetically inhibited (Gilmartin, 2013). Not only CS (tone) but also other sensory inputs (visual and olfactory etc.) and the spatial context could be a cue predicting US (shock). Moreover, during the acquisition phase, the presence of the electric shock inherently induces a freezing response as a natural defensive behavior, which may obscure specific behavioral changes related to the associative learning process. Therefore, the freezing behavior during acquisition cannot be regarded as a sign for specific association of CS and US. Instead, on the next day, we specifically evaluated the CS-US association of the conditioned animals by measuring freezing behavior in response to CS in a distinct context. We explicitly documented little difference between WT and KD animals during the acquisition phase in the relevant paragraph (line 397).

    1. Reviewer #1 (Public review):

      This paper presents a set of tools that will pave the way for a comprehensive understanding of the circuits that control wing motion in flies during flight or courtship. These tools are mainly focused on wing motor neurons and interneurons, as well as a few motor neurons of the haltere. This paper and the library of driver lines described within it will serve as a crucial resource in the pursuit of understanding how neural circuits give rise to behavior. Overall, I found the paper well-written, the figures are quite nice, and the data from the functional experiments convincing. I do not have many major concerns, but a few suggestions that I think will make the paper easier to understand.

      I think the introduction could use some reorganization, as right now I found it quite difficult to follow. For example, lines 85-88 seem to fit more naturally at the end of the next paragraph, compared to the current location of those sentences, which feels rather disjointed. I would suggest introducing the organization of the wing motor system (paragraphs 3 and 4) and then discussing the VNC (paragraph 2) before moving on to describe the neurons within the VNC that may control wing motion. Additionally, lines 141-144, which describe the broad subdivisions of the VNC, can be moved up to where the VNC is first introduced.

      One of my major takeaways from the paper is the call to examine the premotor circuits that govern wing motion. For that reason, I was surprised that there was little mention of the role of sensory input to these circuits. As the authors point out in the discussion, the haltere, for example, provides important input to the wing steering system. I recognize that creating driver lines for the sensory neurons that innervate the VNC is well beyond the scope of this project. I would just like some clarification in the text of the role these inputs play in structuring wing motion, especially as some act at rapid timescales that possibly forgo processing by the very circuits detailed here. This brings up a related issue: if the roles of the interneurons that are presynaptic to the wing motor neurons are "largely unexplored," with how much confidence can we say that they are the key for controlling behavior? To be sure, this has been demonstrated quite nicely in the case of courtship, but in flight, I think the evidence supporting this argument is less clear. I suggest the authors rephrase their language here.

  4. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. These people are exceeding courteous, gentle of disposition, and well conditioned, excelling all others that we have seen. I think they excel all the people of America; of stature much higher than we. Some of them are black thin bearded. They make beards of the hair of beasts and one of them offered a beard of their making to one of our sailors, for his that grew on his face, which because it was of a red color they judged to be none of his own. They are quick eyed and steadfast in their looks, fearless of others’ harms, as intending none themselves. Some of the meaner sort given to filching, which the very name of Savages (not weighing their ignorance in good or evil) may easily excuse

      The explorers describe the Native Americans they encountered in notably positive terms, contrasting with later harsher colonial attitudes; it hints at initial possibilities for peaceful relations

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Idiopathic scoliosis (IS) is a common spinal deformity. Various studies have linked genes to IS, but underlying mechanisms are unclear such that we still lack understanding of the causes of IS. The current manuscript analyzes IS patient populations and identifies EPHA4 as a novel associated gene, finding three rare variants in EPHA4 from three patients (one disrupting splicing and two missense variants) as well as a large deletion (encompassing EPHA4) in a Waardenburg syndrome patient with scoliosis. EPHA4 is a member of the Eph receptor family. Drawing on data from zebrafish experiments, the authors argue that EPHA4 loss of function disrupts the central pattern generator (CPG) function necessary for motor coordination.

      The main strength of this manuscript is the human genetic data, which provides convincing evidence linking EPHA4 variants to IS. The loss of function experiments in zebrafish strongly support the conclusion that EPHA4 variants that reduce function lead to IS.

      The conclusion that disruption of CPG function causes spinal curves in the zebrafish model is not well supported. The authors' final model is that a disrupted CPG leads to asymmetric mechanical loading on the spine and, over time, the development of curves. This is a reasonable idea, but currently not strongly backed up by data in the manuscript. Potentially, the impaired larval movements simply coincide with, but do not cause, juvenile-onset scoliosis. Support for the authors' conclusion would require independent methods of disrupting CPG function and determining if this is accompanied by spine curvature. At a minimum, the language of the manuscript could be toned down, with the CPG defects put forward as a potential explanation for scoliosis in the discussion rather than as something this manuscript has "shown". An additional weakness of the manuscript is that the zebrafish genetic tools are not sufficiently validated to provide full confidence in the data and conclusions.

      We highly appreciate the reviewer’s insightful comments and the acknowledgment of the main values of our study. We agree with the reviewer that further experiments are needed to fully establish the relationship between CPG and scoliosis. In response, we have revised the conclusion in the manuscript to better reflect this. Additionally, we conducted further analyses on the mutants to provide additional evidence supporting this concept.

      Reviewer #1 (Recommendations for the authors):

      Epha4a mutant zebrafish exhibited mild spinal curves, mostly laterally and in the tail. This was 75% of homozyous mutants but also, surprisingly, about 20% of heterozygotes. epha4b mutants also developed some mild scoliosis. If the two zebrafish paralogs can compensate for each other (partial redundancy), we might expect more severe scoliosis in double mutants. Did the authors generate and analyze double mutants? I believe it would be very useful for this study to report the zebrafish phenotype of loss of both paralogs together.

      We appreciate the reviewer’s insightful comment regarding the potential value of reporting the phenotype of eph4a/eph4b double mutants. While we fully agree that this analysis would be valuable, our attempts to generate double mutants have been unsuccessful. These two genes are closely linked on the chromosome, with less than 100 kb separating them, which makes it challenging to generate double mutants through standard genetic crossing. Establishing a double mutant line would require more than a year due to the technical constraints of the process. Although we are unable to address this question directly at this time, we hypothesize that eph4a/eph4b double mutants may exhibit a higher likelihood of body axis abnormalities based on the phenotypes observed in single mutants and the known functions of these genes.

      We hope this perspective will provide some useful context despite the limitations.

      In Figure 1F, a pCDK5 western blot is performed as a readout of EPH4A signaling after either WT or C849Y mutant EPH4A is transfected into HEK 293T cells. It would be useful to mention in the text, or at least the figure legend, how this experiment was performed/where the protein samples came from. It is included in the methods, but in the main text, it simply says "we conducted western blotting" without mentioning whether the protein samples were from cell lines, patients, or another source.

      Sorry for our ignorance. A detailed description of the western blotting conduction was supplemented at both “results” part (page 8, line 187-190) and the Figure 1 legend.

      Was the relative turn angle biased to the left or right side of the fish? (i.e. is a positive angle a rightward or leftward turn?)

      We are sorry for our unclear description. In Figure 3D, positive angle means turning left, while negative angle means turning right. In wild-type larvae, the average turning angle over a 4-minute period is approximately 0, whereas in mutants, this value deviates from 0, indicating a directional preference (positive for leftward and negative for rightward turns) in swimming behavior during the recording period. We have also made the necessary supplementation in the text and figure legend.

      In Figure 4, morpholinos rather than mutants are used, but it is not clear why. Has it been established that the MO used disrupts gene function specifically? Can the effect of the MO be rescued by expressing a wild-type mRNA of Epha4a? Does MO knockdown induce spinal curves if fish are raised? Indeed, this could be a way to determine whether the spinal curves are caused by early events in development (when MOs are active).

      Thanks for the comments. The efficacy of relevant MOs has been well-documented in numerous previous studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Following this reviewer’s suggestion, we have raised the epha4a morphants into adults, while no scoliosis were observed, suggesting that the spinal curvature formation may be induced by long-term defects in the absence of Epha4a. Additionally, we reconfirmed the abnormal motor neuron activation frequency phenotype in the mutants background. The corresponding data have replaced the original Figure 4 in the manuscript. 

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Reviewer #2 (Recommendations for the authors):

      Supplementary Table 3 is missing.

      Sorry for any inconvenience caused to the reviewers. Due to the size of the supplementary Table 3, we have separately uploaded an Excel file as supplementary materials. We have also double-checked during the resubmission process of the revised manuscript. Thanks for your thorough review.

      The authors report only a single mutant allele for zebrafish epha4a and epha4b. Additionally, they provide no information about how many generations each allele has been outcrossed. The authors should provide some type of validation that the phenotypes they describe result from loss of function of the targeted gene and not from an off-targeting event.

      Thanks for the comments. For epha4a and epha4b mutants, each homozygous mutant was initially derived from the self-crossing of first filial generation heterozygotes, and subsequent homozygous generations were maintained for fewer than three rounds of in-crossing. Interestingly, we observed a reduction in the incidence of scoliosis across successive generations. This trend may be attributed to potential genetic compensation mechanisms, which could mitigate the phenotypic severity over time. To address concerns about possible off-target effects, we synthesized and injected epha4a mRNA to test for phenotypic rescue. Our data show that epha4a mRNA injection partially restored swimming coordination in the mutants (Fig. S5). Moreover, similar motor coordination defects have been reported in Epha4-deficient mice, as documented in previous studies (Kullander et al., 2003; Borgius et al., 2014). These findings collectively strengthen the hypothesis that Epha4a plays a critical role in regulating motor coordination.

      References

      (1) Borgius, L., Nishimaru, H., Caldeira, V., Kunugise, Y., Low, P., Reig, R., Itohara, S., Iwasato, T., and Kiehn, O. (2014). Spinal glutamatergic neurons defined by EphA4 signaling are essential components of normal locomotor circuits. J Neurosci 34, 3841-3853.

      (2) Kullander, K., Butt, S.J., Lebret, J.M., Lundfald, L., Restrepo, C.E., Rydstrom, A., Klein, R., and Kiehn, O. (2003). Role of EphA4 and EphrinB3 in local neuronal circuits that control walking. Science 299, 1889-1892.

      The authors need to provide allele designations for the mutant alleles following accepted nomenclature guidelines.

      Thank you for your careful review! We have reviewed and made revisions to the genes and mutation symbols throughout the entire text.

      The three antisense morpholino oligonucleotides need to be validated for efficacy and specificity.

      Thanks for the comments. The morpholinos were extensively used and validated in previous studies, and the efficacy of these morpholinos has been thoroughly validated in multiple studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Furthermore, we also performed swimming behavior analysis in the mutant background, which showed similar results as the morphants. Moreover, we also performed rescue experiments to confirm the specificity of the mutants (Fig. S5). Finally, we reconfirmed the abnormal calcium signaling in the mutants (Fig. 4), which further support our previous knockdown results.

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Line 229. "While in consistent with previous reports, the hindbrain rhombomeric boundaries were found to be defective....". This sentence is not clear. Please describe how it is "inconsistent".

      Thanks for the comments and sorry for the unclear description, we have described this more clearly in our revised manuscript (page 9, line 229-230).

      Animals frequently are described as "heterozygous mutants" or "mutants". Please make clear that the latter are homozygous mutant animals.

      Thanks for the comments. In the manuscript, all references to mutants specifically indicate homozygous mutants. Heterozygous mutants are explicitly identified as such.

      The chromatin interaction portion of the Methods does not include any information on how these experiments were conducted or where the data were obtained. This information needs to be provided.

      Thanks for your advice. The detailed information of chromatin interaction mapping has been provided in “Methods and Materials” (page 18-19, line 450-455). Information about the interacting regions was derived from Hi-C datasets of 21 tissues and cell types provided by GSE87112. The significance of interactions for Hi-C datasets was computed by Fit-Hi-C, with an FDR ≤ 10-6 considered significant.

      The authors present single-cell RNA-seq data in Supplementary Figure 5 for which they cite Cavone et al, 2021. This seems like an odd database to use. Can the authors provide an explanation for choosing it? In any case, the citation should also be made in the Supplementary Figure 5 legend.

      Thank you for your rigorous comment, we have cited this literature in the proper place of the revised manuscript. Cavone et al. used the her4.3:GFP line to label ependymo-radial glia (ERG) progenitor cells and performed single-cell RNA-seq on FACS-isolated fluorescent cells. The isolated cells included not only ERG progenitors but also undifferentiated and differentiated neurons and oligodendrocytes. The authors attributed this to the relative stability of the GFP protein, which remained in the progeny of GFP-expressing her4.3+ ERG progenitor cells, thus effectively acting as a short-term cell lineage tracer. Indeed, clustering analysis of this data successfully identifies neural progenitors and other neural clusters. Therefore, we consider that this scRNA-seq data encompasses a comprehensive range of neural cell types and is suitable for analyzing the expression of genes of interest. Furthermore, we downloaded and analyzed the scRNA-seq data of the zebrafish nervous system reported by Scott et al. in 2021 (Fig. S7B) (Scott et al., 2021). Despite differences in the developmental stages of the larvae analyzed (Cavone et al. examined larvae at 4 dpf, whereas Scott et al. analyzed larvae at 24, 36, and 48 hpf), our findings are consistent. Specifically, epha4a and epha4b are expressed in interneurons, whereas efnb3a and efnb3b are enriched in floor plate cells.

      References

      (1) Scott, K., O'Rourke, R., Winkler, C.C., Kearns, C.A., and Appel, B. (2021). Temporal single-cell transcriptomes of zebrafish spinal cord pMN progenitors reveal distinct neuronal and glial progenitor populations. Dev Biol 479, 37-50.

      In Figure Legend 1, "expressed from the EPHA4-mutant plasmid" is not an accurate description of the experiment.

      Sorry for the previous inaccurate description. The description has been revised to accurately reflect the experiment. “Western blot analysis of EPHA4-c.2546G>A variant showing the protein expression levels of EPHA4 and CDK5 and the amount of phosphorylated CDK5 (pCDK5) in HEK293T cells transfected with EPHA4-mutant or EPHA4-WT plasmid”.

      Figure 3 panels J and K need more explanation. I don't understand what the different colors represent nor do I understand what are wild type and what are mutant data.

      Thank you for your valuable feedback. We apologize for the lack of clarity in the original figure legend. To address this, we have revised the legend of Figure 3 to provide a more detailed explanation. In panels J and K, each color-coded curve represents the response of an individual larva from an independent experimental trial to the stimulus. Specifically, panel J depicts the response data for the wild-type larvae, whereas panel K presents the response data for the homozygous epha4a mutants.

      Please provide the genotypes for the images in Figure 5A.

      Thanks for the comments and we are sorry for our unclear description, we have described this more clearly in the Figure 5.

      Figure legend 6B should also note the heterozygote data with the wild type and homozygous mutant data.

      Thanks for the comments, the data are now included in Figure 6B.

      Epha4 and Efnb3 have well-established roles in axon guidance. Although this is noted in the Discussion, I think a more extensive description of prior findings would be helpful.

      Thanks for your valuable feedback. A more detailed description of the roles of Epha4 and Efnb3 in axon guidance was provided in the “Discussion” (page 16, line 388-396).

      The main conclusion of this manuscript is that EPHA4 variants cause IS by disrupting central pattern generator function. I think this is misleading. I think that the more valid conclusion is that EPHA4 loss of function causes axon pathfinding defects that impair locomotion by disrupting CPG activity, thereby leading to IS. I urge the authors to consider this more nuanced interpretation.

      Thank you for your insightful comments. We appreciate your suggestion to refine our main conclusion. We agree that the proposed revision more accurately reflects our findings and will revise the manuscript accordingly to state that “EPHA4 loss of function causes axon pathfinding defects, which impair locomotion by disrupting central pattern generator activity, potentially leading to IS.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to final minor critiques following initial revision

      Reviewer #1 (Recommendations for the authors): 

      The authors have generally done an excellent job of addressing my and the other reviewers' concerns. I have a few additional concerns that the authors could consider addressing through changes to the text: 

      We thank the Reviewer for this assessment and are glad to have addressed the major points.

      - Regarding the gRNA used for NMR studies, I thank the authors for adding additional rationale for their design of the RNA used. However, I still believe that it is misleading to term this RNA as a "gRNA", given that it is mainly composed of a sequence that is arbitrary (the spacer) and the sections of the gRNA that are constant between all gRNAs are truncated in a way that removes secondary structure that is likely essential for specific contacts with the Rec domains. I do not believe the authors need to make alterations to any of their experiments. However, I do think their description of the "gRNA" should be updated to properly reflect that this RNA lacks any of the secondary structure present in a typical gRNA, much of which is necessary to confer specificity of binding between GeoCas9 and the gRNA. As mentioned in my previous review, this may be best achieved by adding a cartoon of the secondary structure of the full-length gRNA and highlighting the region that was used in the truncated "gRNA". 

      We understand the Reviewer’s point. For any experiment in which the gRNA was truncated (i.e. NMR or some MST studies), we have clarified the text and no longer call it a “gRNA.” We state initially that it is a portion of the gRNA and then call it simply an “RNA.” 

      For experiments using the full-length constructs, we have kept the term “gRNA,” as it remains appropriate.

      We have also added a final Supplementary figure (S12) showing the structures of the truncated and full-length RNAs used, based on the _Geo_Cas9 cryo-EM structure and predicted with RNAfold.

      - Lines 256-257: "The ~3-fold decrease in Kd...". I believe the authors are discussing the Kd's of the mutants relative to WT, in which case the Kd increased. Also, the fold-change appears closer to 2fold than to 3-fold. 

      Yes, the Reviewer makes a good catch. We have corrected this.

      - Lines 407-408: "The mutations also diminished the stability of the full-length GeoCas9 RNP complex." This statement seems at odds with the authors' conclusions in the Results section that the full-length GeoCas9 variants had comparable affinities for the gRNAs (lines 376-382) 

      We agree that this seems contradictory. In the absence of full-length structures for all variants, we can’t definitively state what causes this. It could be that the mutation has an interesting allosteric effect on structure that does not affect RNA binding but induces the Cas9 protein to simply fall apart at lower temperatures, rendering the binding interaction moot. We have added a statement to this section.

      - The authors chose to keep "SpCas9" for consistency with their prior work and the work of many several others, including Doudna et al and Zhang et al. However, I will note that their publications on GeoCas9, the Doudna lab did use SpyCas9 to ensure consistent nomenclature within the publications. 

      We have made the change to “_Spy_Cas9”

      Reviewer #3 (Recommendations for the authors): 

      The authors clearly answered most of my concerns. I still have some technical questions about the analysis of CPMG-RD data but the numbers provided now seem to make sense. While I still think that crystal structures of the point mutant would make the conclusions more "bullet proof", I do appreciate the work associated with this and consider that the manuscript can be published as is. 

      We agree that additional magnetic fields could allow for additional models of CPMG data fitting and that additional crystal structures of the mutants could add to the conclusions. We appreciate the Reviewer recognizing the balance of the current results and potential future studies in signing off on publication.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments and suggestions. Our plans for revisions are first summarized. Below you can find the original reviews and our responses and detailed plans (indicated by "Response").

      Revision plan summary:

      1. Many of the concerns can be addressed by changes in the text and better explanations of how the experiments were done. These changes are detailed in the point-by-point responses.
      2. The reviewers suggested experiments such as ChIP-seq and immunoprecipitation which require collection of a large number of mutants. Since our mutants are sterile, the line needs to be maintained as heterozygotes, from which we can pick out individual mutant worms. Therefore, with the current reagents it is impossible to collect mutants in sufficient quantities for ChIP-seq or IP. We understand that it limits the conclusions that can be drawn.
      3. For some figures, additional quantification of fluorescence signal will be done to show differences between mutant and wild type.
      4. A few experiments will be repeated:
      5. We will repeat the ATPase assays shown on Fig 1 with additional independently prepared and purified protein samples.
      6. Additional replicates will be performed for the few immunofluorescence experiments that were only performed once. Point-by-point responses:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Dosage compensation (DC) in C. elegans involves halving the gene expression from the two hermaphrodite X chromosomes to match the output of the single X in male worms. The key regulator of this repression is a specialized condensin complex, which is defined by a dedicated SMC-4 paralog, termed DPY-27. SMC-4 in other animals is an ATPase that functions as a motor of loop extrusion in cohesion complexes. In their current manuscript, Chawla et al. assessed whether DPY-27 has ATPase function and whether this activity is required for dosage compensation. It had previously been shown that an ATPase-deficient 'EQ' mutant DPY-27 protein interacts with other DC complex members, yet fails to localize to the X. This observation was made with an extra copy of DPY-GFP expressed in addition to the endogenous wildtype protein [Ref 77]. No dominant negative effect was observed. The authors have now engineered the 'EQ' mutation into the endogenous gene locus and genetically generated hetero- and homozygous ATPase mutant worms. Their data suggest that the ATPase activity is required or X-chromosome localization, complex assembly, chromosome compaction as well as enrichment of H4K20me1 on the dampened X chromosome.

      Major comments: 1. ATPase assays, Figure 1.Preparations of individual recombinant proteins may vary significantly and may occasionally show much reduced enzymatic activity. A conclusion about the failure of an ATPase activity should not be concluded from a single preparation, but several protein preps need to be tested, which then serve as 'biological replicates' for the in vitro reaction. Apparently, the ATPase assays shown only involved technical replicates, which is not sufficient.

      Response: We will express and purify additional protein samples and will repeat the assay.

      CRISPR-mediated engineering may lead to unwanted reactions, exemplified by the 'indel' mutation that was recovered in one clone. As a good practice and important control, the sequences of the mutated alleles in the worms should be determined by sequencing of PCR products. Restrictions enzyme cleavage or gel electrophoresis of the PCR products is not sufficient to document the nature of the mutation.

      Response: The sequence of the edit was confirmed by Sanger sequencing. We will make it clear in the text.

      All IF data need to be collected from at least 2 biological replicates, i.e. the experiment must have been carried out independently on two different days. The replicates should deliver consistent results. The number of independent replicates should be mentioned in each figure legend.

      Response: Most of our experiments were performed multiple times. We will indicate the number of replicates in the figure legends. The one or two experiments that were only performed once, will be repeated an additional time.

      The expression levels of wildtype and mutant proteins are concluded from IFM. This is very qualitative; quantitative measurements would strengthen the paper.

      Response: We will quantify fluorescence intensity on our existing images to show differences between mutant and wild type.

      Figure 4B: What are the criteria for classification of the three classes of mutant nuclei? To the uninitiated eye they look very similar. I am a bit worried about the human bias, if such diffuse staining are to be categorized. The two categories of localization need be documented better.

      Response: We will provide more images to show the range of phenotypes and provide a better explanation of how they were classified. We will also try a few ways to quantify “diffuseness” to provide a numerical readout.

      Figure 5: volume of the X chromosome. Related to (5): Apparently, the mask that contains the X chromosome was drawn by hand on each individual nucleus? I find it very difficult to see how the X chromosomal territory would be assessed in the examples shown. I would be good to see a panel of nuclei, in which the masks are visible. I think the analysis should be blinded, in which a researcher not involved in the analysis draws masks on coded nuclei and their classes are only revealed later. The same concern holds for the FISH/IP overlaps or DPY-27/SDC-2 overlaps.

      Response: The masks used were not drawn by hand but were based on fluorescence intensity thresholds. We will make a supplementary figure that shows the masks used for quantification to help clarify how the experiment and quantification were performed.

      For figure 5, age-matched hermaphrodites were analyzed. How was the age determined and what would be the consequence of age-variations? What is the effect of the mutations on development?

      Response: For our staining experiments, we routinely use young adult which we define as 24 hr past larval L4 stage. At this stage, young adults have started laying eggs. We have unpublished data that shows that dosage compensation and chromosome compaction deteriorates with age. To avoid using old worms in our assays, we pick L4 larvae, and then use them for experiments the following day.

      Minor comments: 8. The labeling of p-values as a-f in the figures with the values listed in a supplemental table is not comfortable. The p-values corresponding to the letters should be listed in the corresponding legends.

      Response: p values can be added to the figure or the figure legend (they are currently in supplementary tables).

      How were the concentrations of the ATPase preparations determined? It would help to see a proteins gel in the supplement to assess their purity.

      Response: Concentrations were determined using a spectrometer. We can show protein gels of the preparations as a supplementary figure.

      In figure 1, heterodimers are assumed, but not shown. Do they dimerize under these conditions?

      Response: We can cite papers from others that show heterodimerization in these conditions (for example, Hassler et al, 2019).

      Reviewer #1 (Significance (Required)):

      Significance: The involvement of the ATPase function of DPY-27 was somewhat expected, in light of the earlier findings published in reference 77 using a transgene. The current study confirms and extends these earlier findings. In principle, the genetic experiment presented here is stronger, if documented better.

      Strengths: The study investigates endogenous proteins and measures different phenomena known to be correlated from previous work. The data are internally consistent.

      Limitations: The lack of biological replicates, and unclear procedures of how to draw the IF masks that underlie the conclusions about X chromosome (co)localization and nuclear volume determination render the argument less convincing. For this reviewer, who is not in the C. elegans field, the analysis of mutant phenotypes is difficult to follow. The conclusions are based on only one type of experiment. In reference 77, the X chromosome binding was done by ChIP-seq, clearly a superior, complementary method.

      Response: As explained above, since the strain has to be maintained as a heterozygote, we are unable to collect enough mutants for a ChIP-seq experiment. We can perform and better document the experimental replicates and we can better explain the quantification methods used.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors analyzed the ATPase function of an SMC-4 variant required for dosage compensation in C. elegans. They made a single amino acid mutation that significantly reduced ATPase activity of the protein as shown by in vitro ATP hydrolysis. They showed that the mutation results in the phenotypic consequences of those shown for other DC mutants, including viability assay, immunofluorescence and DNA FISH. These results demonstrate the important role of ATPase activity in transcription repression.

      Major comments: - Are the key conclusions convincing? The key conclusion that DPY-27 has ATPase activity and using a classic mutation that reduces it largely eliminates its function is convincing. The interpretation of the IF experiments to build the model in the final figure requires stronger evidence, as commented below in additional experiment section.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Yes, as explained below.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The main issue with the current model is that the authors assume that the EQ proteins that they are analyzing is in complex with the rest of the condensin IDC subunits. However, there is no evidence in the paper suggesting that this occurs. The results are consistent with the possibility that a large portion of the DPY27-EQ is not in a complex.

      IP-western experiments comparing the proportion of other subunits pulled down by the wild type versus the EQ mutant (perhaps extract from ~50% EQ containing population could be reached) is needed to understand the incorporation of the EQ mutant in the complex. This is particularly important for the interpretation of the data in Figure 4A, where 70% of the nuclei show diffuse CAPG-1 and DPY-27 EQ. Is this signal due to disassembled subunits diffusing freely, or as depicted in the model figure, bound less stably everywhere? The immunofluorescence results are consistent with both EQ mutation 1) forming a full complex and unstably binding or 2) destabilizing the complex but incompletely assembled complexes sustaining a pool of free EQ detected by the immunofluorescence experiments.

      Response: We agree that to conclusively show interactions, an IP would be necessary. However, as explained above for ChIP, it is not possible to collect enough mutants to make enough protein extract for an IP. An IP in heterozygous worms is also not ideal, as it would be nearly impossible to distinguish wild protein from the mutant. The antibody we used recognizes the N terminus, which is identical in the two proteins. The only way to distinguish them would be mass spec. However, during the fragmentation process for mass spec, Q can deaminate to E, which would complicate interpretation of our data. To do this experiment properly, we would need to introduce a different tag into the mutant protein. With the current reagents, an IP is not possible.

      Instead, we have to rely on indirect evidence. The fact that DPY-27 and CAPG-1 colocalize (figure 4) does provide some support for the hypothesis. From previous studies,including our recent publication Trombley et al PLoS Genetics 2025, we know that the condensin IDC complex is not stable unless all subunits are present. It is therefore highly unlikely, although not impossible, that what we detect is diffuse individual subunits.

      We can make changes in the text to soften this claim and better discuss the caveats of the experiment and the conclusions.

      Along the same point, authors show that EQ protein that binds to the X is incapable of bringing H4K20me1, which is consistent with the possibility that a large portion of the EQ protein is not in a complex. : "To our surprise, we observed that there was no discernable enrichment of H4K20me1, even though there is discernable enrichment of DPY-27 EQ on the X chromosomes in the dpy-27 EQ mutants (Figure 8A).

      Response: There is an important difference. CAPG-1 and DPY-27 are both members of condensin IDC. The five subunits of this complex depend on each other for stability. DPY-21, the protein that introduces the H4K20me1 mark, also localizes to the X chromosomes, but is not part of condensin IDC. Condensin IDC is able to localize to the X chromosomes in the absence of DPY-21, and is not dependent on DPY-21 for stability. However, DPY-21 is dependent on condensin IDC for X localization (Yonker et al 2003). It is then possible that the mutant condensin IDC is X-bound, but it is unable to recruit DPY-21. We can clarify this in the text.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. It is unclear how long it would take to collect enough het/mutant worms can be collected for IP-western. Without additional evidence, interpretation of the data would be affected.

      Response: As explained above, collecting enough mutant worms is essentially impossible. Collecting enough heterozygotes is possible, but distinguishing the mutant protein from the wild type in hets is not.

      • Are the data and the methods presented in such a way that they can be reproduced? Yes
      • Are the experiments adequately replicated and statistical analysis adequate? Yes, except the presentation of the test (see minor comment below)

      Minor comments: - Specific experimental issues that are easily addressable. The use of letters for statistical test result is confusing and the figure legend is not clear about what actual p values were produced "Letters represent multiple comparison p values, with different letters indicating statistically significant differences, and any repeated letter demonstrating no significance. " Providing the values at a reasonably concise manner in the legend will help the reader a lot.

      Response: P values can be added to the figures, or the legend

      • Are prior studies referenced appropriately? The authors state that "Surprisingly, this mutant did not phenocopy the transgenic EQ mutant in [77], .." however in the previous paragraph, the authors state that the transgenic was expressed in the presence of wild type copy. Therefore, the endogenous mutant showing phenotypes rather than the transgenic is rather expected.

      Response: What we referred to were ways in which the protein behaved (for example in ability to bind to the X at all), and not mutant phenotypes of worms. We can clarify this in the text.

      The authors state that "One possible explanation could be that mitotic condensation has multiple drivers of equal consequence including changes in histone modifications [129], whereas condensation of dosage compensated X chromosomes is predominantly dependent on the DCC. " In a dpy-21 mutant, X chromosome decondenses but DPY-27 stays on the chromosome. Therefore, the effect of the EQ mutation may be due to lack of H4K20me1 enrichment in addition to the lack of loop extrusion.

      Response: We can add the role of H4K20me1 to the discussion.

      • Are the text and figures clear and accurate? Yes
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? The Pearson correlation coefficient for assessing colocalization between SDC-2 and DPY-27 was helpful for quantification, because there is a lot of background signal that makes the support for or lack of colocalization with the X in the other IF/FISH figures difficult to assess. Additionally, please provide information on how chromatic aberration was assessed when analyzing colocalization experiments.

      Response: Chromatic aberration was not considered for these experiments.

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Although long assumed to be a functional SMC, the demonstration of DPY-27 function depending on ATPase activity is important. This demonstrates that an X-specific condensin retained its SMC activity.

      • Place the work in the context of the existing literature (provide references, where appropriate). The authors do an adequate job in doing this in their discussion.

      • State what audience might be interested in and influenced by the reported findings. The field of 3D genome organization and function would be influenced by the reported findings.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomic analyses of 3D genome organization and gene expression.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Nakagawa and colleagues report the observation that YAP is differentially localized, and thus differentially transcriptionally active, in spheroid cultures versus monolayer cultures. YAP is known to play a critical role in the survival of drug-tolerant cancer cells, and as such, the higher levels of basally activated YAP in monolayer cultures lead to higher fractions of surviving drug-tolerant cells relative to spheroid culture (or in vivo culture). The findings of this study, revealed through convincing experiments, are elegantly simple and straightforward, yet they add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology simply because the abundance of residual cells in this format is much greater than in spheroid or xenograft models. The potential linkage between matrix density and stiffness and YAP activation, while only speculated upon in this manuscript, is intriguing and a rich starting point for future studies.

      Although this work, like any important study, inspires many interesting follow-on questions, I am limiting my questions to only a few minor ones, which may potentially be explored either in the context of the current study or in separate, follow-on studies.

      We appreciate Reviewer #1's comments that our work is of importance to the field and particularly that it will "...add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology..."  We have sought to highlight the importance of how our findings could be applied to study resistance mechanisms at various points in the manuscript.

      Strengths:

      The major strengths of the work are described above.

      Weaknesses:

      Rather than considering the following points as weaknesses, I instead prefer to think of them as areas for future study:

      (1) Given the field's intense interest in the biology and therapeutic vulnerabilities of residual disease cells, I suspect that one major practical implication of this work could be that it inspires scientists interested in working in the residual disease space to model it in monolayer culture. However, this relies upon the assumption that drug-tolerant cells isolated in monolayer culture are at least reasonably similar in nature to drug-tolerant cells isolated from spheroid or xenograft systems. Is this true? An intriguing experiment that could help answer this question would be to perform gene expression profiling on a cell line model in the following conditions: monolayer growth, drug tolerant cells isolated from monolayer growth conditions, spheroid growth, drug tolerant cells isolated from spheroid growth conditions, xenograft tumors, and drug tolerant cells isolated from xenograft tumors. What are the genes and programs shared between drug-tolerant cells cultured in the three conditions above? Which genes and programs differ between these conditions? Data from this exercise could help provide additional, useful context with which to understand the benefits and pitfalls of modeling residual tumor cell growth in monolayer culture.

      We thank the reviewer for suggesting valuable future studies. We agree that the proposed experiments represent important next steps in understanding the role of YAP and other pathways in primary resistance. We believe, however, these experiments are both beyond the scope of the current manuscript and beyond what can reasonably be addressed in a revision. The distinct challenges associated with comparing in vivo and in vitro conditions would require significant optimization of single-cell approaches, especially given the robust cell death driven by afatinib treatment in vivo. Given the complexity of in vivo experimentation, we are concerned that such studies may not guarantee biologically meaningful insights. Nonetheless, we agree that this is a compelling direction for future research. If common gene expression patterns could be identified despite these challenges, such studies could help validate monolayer culture as a relevant model for investigating residual disease.

      (2) In relation to the point above, there is an interesting and established connection between mesenchymal gene expression and YAP/TAZ signaling. For example, analyses of gene expression data from human tumors and cell lines demonstrate an extremely strong correlation between these two gene expression programs. Further, residual persister cancer cells have often been characterized as having undergone an EMT-like transition. From the analysis above, is there evidence that residual tumor cells with increased YAP signaling also exhibit increased mesenchymal gene expression?

      We agree with the reviewer that a connection between YAP/TAZ activity and EMT is likely, given prior studies exploring correlations between these two gene signatures. We believe, however, exploring EMT represents a distinct research direction from the primary focus of the current manuscript.  We are concerned exploration of EMT, especially in the absence of corresponding preclinical models or mechanistic data directly linking EMT to therapy resistance in our models, could distract from the main conclusions of the manuscript. While we plan to stain for EMT-associated markers in the residual cancer tissue from the in vivo studies, it remains unclear whether such data would meaningfully contribute to the revised manuscript, regardless of the outcome.

      Reviewer #2 (Public review):

      The manuscript by Nakagawa R, et al describes a mechanism of how NSCLC cells become resistant to EGFR and KRAS G12C inhibition. Here, the authors focus on the initial cellular changes that occur to confer resistance and identify YAP activation as a non-genetic mechanism of acute resistance.

      The authors performed an initial xenograft study to identify YAP nuclear localization as a potential mechanism of resistance to EGFRi. The increase in the stromal component of the tumors upon Afatinib treatment leads the authors to explore the response to these inhibitors in both 2D and 3D culture. The authors extend their findings to both KRAS G12C and BRAF inhibitors, suggesting that the mechanism of resistance may be shared along this pathway.

      The paper would benefit from additional cell lines to determine the generalizability of the findings they presented. While the change in the localization of YAP upon Afatinib treatment was identified in a xenograft model, the authors do not return to animal models to test their potential mechanism, and the effects of the hyperactivated S127A YAP protein on Afatinib sensitivity in culture are modest. Also, combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies.

      We thank the reviewer for their insightful comments. In this manuscript, we present data from 5 cell lines representing the EGFR/BRAF/KRAS pathway, demonstrating the generalizability of YAP-driven decreased cancer cell sensitivity to targeted inhibitors when cultured in 2D compared to spheroid counterparts. While expanding this analysis to a larger panel of cell lines is beyond the scope of the current study, we believe our findings provide a strong rationale for future investigations, including high-throughput screens conducted by other research groups and pharmaceutical companies, to recognize the value in screening spheroid cell cultures. We hope this work helps shift the field of cancer therapeutics toward screening approaches that better reflect tumor biology into drug discovery pipelines and believe this could be one of the most impactful and enduring contributions of our study.

      Reviewer #2 also mentions that "...combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies..."  The concept that YAP/TAZ inhibitors (i.e. TEAD inhibitors) could be additive or synergistic in 2D culture is one that is being actively tested across several groups and in pharma. Several recent examples include a publication by Hagenbeek, et al., Nat. Cancer, 2023 (PMID: 37277530) showing that a TEAD inhibitor overcomes KRASG12C inhibitor resistance. Additional, recent work by Pfeifer, et al., Comm. Biol., 2024 (PMID: 38658677) suggests a similar effect between EGFR inhibitors and a different TEAD inhibitor. While neither of these studies extensively probes cell death pathways in the way performed in our studies, they nevertheless provide strong evidence that indeed TEAD + targeted EGFR/RAF/RAS inhibition in 2D have additive, if not synergistic, effects. We feel that these recent published studies affirm our findings and repeating such experiments is unlikely to add much new information. We thus feel they are beyond the scope of our present studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      Olfactory sensory neurons (OSNs) in the olfactory epithelium detect myriads of environmental odors that signal essential cues for survival. OSNs are born throughout life and thus represent one of the few neurons that undergo life-long neurogenesis. Until recently, it was assumed that OSN neurogenesis is strictly stochastic with respect to subtype (i.e. the receptor the OSN chooses to express).

      However, a recent study showed that olfactory deprivation via naris occlusion selectively reduced birthrates of only a fraction of OSN subtypes and indicated that these subtypes appear to have a special capacity to undergo changes in birthrates in accordance with the level of olfactory stimulation. These previous findings raised the interesting question of what type of stimulation influences neurogenesis, since naris occlusion does not only reduce the exposure to potentially thousands of odors but also to more generalized mechanical stimuli via preventing airflow.

      In this study, the authors set out to identify the stimuli that are required to promote the neurogenesis of specific OSN subtypes. Specifically, they aim to test the hypothesis that discrete odorants selectively stimulate the same OSN subtypes whose birthrates are affected. This would imply a highly specific mechanism in which exposure to certain odors can "amplify" OSN subtypes responsive to those odors suggesting that OE neurogenesis serves, in part, an adaptive function.

      To address this question, the authors focused on a family of OSN subtypes that had previously been identified to respond to musk-related odors and that exhibit higher transcript levels in the olfactory epithelium of mice exposed to males compared to mice isolated from males. First, the authors confirm via a previously established cell birth dating assay in unilateral naris occluded mice that this increase in transcript levels actually reflects a stimulus-dependent birthrate acceleration of this OSN subtype family. In a series of experiments using the same assay, they show that one specific subtype of this OSN family exhibits increased birthrates in response to juvenile male exposure while a different subtype shows increased birthrates to adult mouse exposure. In the core experiment of the study, they finally exposed naris occluded mice to a discrete odor (muscone) to test if this odor specifically accelerates the birth rates of OSN types that are responsive to this odor. This experiment reveals a complex relationship between birth rate acceleration and odor concentrations showing that some muscone concentrations affect birth rates of some members of this family and do not affect two unrelated OSN subtypes.

      In addition to the results nicely summarized by the reviewer, which focus on experiments to examine the effects of odor stimulation on unilateral naris occluded (UNO) mice, an important part of the present study are experiments on non-occluded (i.e., non-UNO-treated) mice. These experiments show: 1) that the exposure of non-occluded mice to odors from adolescent male mice selectively increases quantities of newborn OSNs of the musk-responsive subtype Olfr235 (Figure 3G, H; previously Figure 6), 2) the exposure of non-occluded female mice to 2 different musk odorants (muscone, ambretone) selectively increases quantities of newborn OSNs of 3 musk responsive subtypes: Olfr235, Olfr1440 and Olfr1431 (Figure 4D-F; previously Figure 6), and 3) the exposure of non-occluded adult female mice to a musk odorants selectively increases quantities of newborn OSNs of musk responsive subtypes (Figure 5; previously Fig. S7). We have reorganized the revised manuscript to more prominently and clearly present the experimental design and findings of these experiments. We have also made changes to clarify (via schematics) the experimental conditions used (i.e., UNO, non-UNO, odor exposure) in each experiment.

      Strengths:

      The scientific question is valid and opens an interesting direction. The previously established cell birth dating assay in naris occluded mice is well performed and accompanied by several control experiments addressing potential other interpretations of the data.

      Weaknesses:

      (1) The main research question of this study was to test if discrete odors specifically accelerate the birth rate of OSN subtypes they stimulate, i.e. does muscone only accelerate the birth rate of OSNs that express muscone-responsive ORs, or vice versa is the birthrate of muscone-responsive OSNs only accelerated by odors they respond to?

      This question is only addressed in Figure 5 of the manuscript and the results only partially support the above claim. The authors test one specific odor (muscone) and find that this odor (only at certain concentrations) accelerates the birth rate of some musk-responsive OSN subtypes, but not two other unrelated control OSN subtypes. This does not at all show that musk-responsive OSN subtypes are only affected by odors that stimulate them and that muscone only affects the birthrate of musk-responsive OSNs, since first, only the odor muscone was tested and second, only two other OSN subtypes were tested as controls, that, importantly, are shown to be generally stimulus-independent OSN subtypes (see Figure 2 and S2).

      As a minimum the authors should have a) tested if additional odors that do not activate the three musk-responsive subtypes affect their birthrate b) choose 2-3 additional control subtypes that are known to be stimulus-dependent (from their own 2020 study) and test if muscone affects their birthrates.

      We appreciate these suggestions. Within the revised manuscript, we have described and included the results from several new experiments:

      (1) As noted by the reviewer, we had previously tested the effects of exposure to only one exogenous musk odorant, muscone, on quantities of newborn OSNs of the musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431. To test whether the effects observed with muscone exposure occur with other musk odorants, we assessed the effects of exposure to ambretone (5-cyclohexadecenone), a musk odorant previously found to robustly activate musk-responsive OSNs (Sato-Akuhara et al., 2016; Shirasu et al., 2014), on quantities of newborn OSNs of 3 musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431, as well as the SBT-responsive subtype Olfr912, in the OEs of non-occluded female mice. Exposure to ambretone was found to significantly increase quantities of newborn OSNs of all 3 musk-responsive subtypes (Figure 4D-F) but not the SBT-responsive subtype (Figure 4–figure supplement 4C-left), indicating that a variety of musk odorants can accelerate the birthrates of musk responsive subtypes.

      (2) To verify that exogenous non-musk odors do not increase quantities of newborn OSNs of musk responsive OSN subtypes (point a, above), we quantified newborn OSNs of 3 musk-responsive subtypes, Olfr235, Olfr1440, and Olfr1431, in non-occluded female mice that were exposed to the non-musk odorants SBT or IAA. As expected, neither of these odorants significantly affected the birthrates of the subtypes tested (Figure 4D-F).

      (3) To confirm that exogenous musk odors do not accelerate the birthrates of non-musk responsive OSN subtypes that were previously found to undergo stimulation-dependent neurogenesis (point b, above), we quantified newborn OSNs of 2 such subtypes, Olfr827 and Olfr1325, in non-occluded female mice that were exposed to muscone. As expected, exposure to muscone did not significantly affect the birthrates of either of these subtypes (Figure 4–figure supplement 4C-middle, right).

      (4) To provide additional confirmation that only some OSN subtypes have a capacity to exhibit increases in newborn OSN quantities in the presence of odors that activate them, we compared quantities of newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT versus unexposed controls. As expected, exposure of SBT caused no significant increase in quantities of newborn Olfr912 OSNs (Figure 4–figure supplement 4C-left).

      (2) The finding that Olfr1440 expressing OSNs do not show any increase in UNO effect size under any muscone concentration (Figure 5D, no significance in line graph for UNO effect sizes, middle) seems to contradict the main claim of this study that certain odors specifically increase birthrates of OSN subtypes they stimulate. It was shown in several studies that olfr1440 is seemingly the most sensitive OR for muscone, yet, in this study, muscone does not further increase birthrates of OSNs expressing olfr1440. The effect size on birthrate under muscone exposure is the same as without muscone exposure (0%).

      In contrast, the supposedly second most sensitive muscone-responsive OR olfr235 shows a significant increase in UNO effect size between no muscone exposure (0%) and 0.1% as well as 1% muscone.

      Findings that quantities of newborn Olfr1440 OSNs do not show a significantly greater UNO effect size in the OEs from mice exposed to muscone compared to control mice was also somewhat surprising to us. We think that there are two potential explanations for this result: 1) Unlike subtype Olfr235, subtype Olfr1440 exhibits a significant open-side bias in newborn OSN quantities in UNO-treated adolescent females even in the absence of exposure to muscone. We speculate that this subtype (as well as subtype Olfr1431) is stimulated by odors that are emitted by female mice at the adolescent stage, and/or by another environmental source. This may limit the influence of muscone exposure on the UNO effect size. 2) There is compelling evidence that odors within the environment can enter the closed side of the OE transnasally [via the nasopharyngeal canal (Kelemen, 1947)] and/or retronasally (via the nasopharynx) in UNO-treated mice [reviewed in (Coppola, 2012)]. Thus, it is conceivable that chronic exposure of UNO-treated mice to muscone results in the eventual entry on the closed side of the OE of muscone at concentrations sufficient to promote neurogenesis. If Olfr1440 is more sensitive to muscone than Olfr235 [e.g., (Sato-Akuhara et al., 2016; Shirasu et al., 2014)], OSNs of this subtype may be especially sensitive to small amounts of odors that enter the closed side of the OE transnasally and/or retronasally. These explanations are supported by the following results:

      - UNO-treated females exposed to 0.1% muscone show higher quantities of newborn Olfr1440 OSNs on both the open and closed sides of the OE in muscone exposed females compared to their unexposed counterparts (Figure 4–figure supplement 1A-middle). Similar results were also observed for newborn Olfr235 OSNs (Figure 4C-middle), albeit to a lesser extent, perhaps due to the lower sensitivity of this subtype to muscone.

      - In non-occluded female mice, exposure to 0.1% muscone was found to significantly increase quantities of newborn Olfr1440 OSNs, as well as newborn Olfr235 and Olfr1431 OSNs (Figure 4D-F in revised manuscript; Figure 6 in original version). Similar results were also observed upon exposure to ambretone, another musk odor (Figure 4D-F). These experiments strongly support the hypothesis that musk odors selectively increase birthrates of OSN subtypes that they stimulate.

      We have addressed these points within the results section of the revised manuscript.

      (3) The authors introduce their choice to study this particular family of OSN subtypes with first, the previous finding that transcripts for one of these musk-responsive subtypes (olfr235) are downregulated in mice that are deprived of male odors. Second, musk-related odors are found in the urine of different species. This gives the misleading impression that it is known that musk-related odors are indeed excreted into male mouse urine at certain concentrations. This should be stated more clearly in the introduction (or cited, if indeed data exist that show musk-related odors in male mouse urine) because this would be a very important point from an ethological and mechanistic point of view.

      In addition, this would also be important information to assess if the chosen muscone concentrations fall at all into the natural range.

      These are important points, which have addressed within the revised manuscript:

      (1) Within the introduction, we have now stated that the emission of musk odors by mice has not been documented. We have also added extensive discussions of what is known about the emission of musk odors by mice in a new subsection within Results, as well as within the Discussion section. Most prominently, we have cited one study (Sato-Akuhara et al., 2016) that noted unpublished evidence for the emission of Olfr1440-activating compounds from male preputial glands: “Indeed, our preliminary experiments suggest that there are unidentified compounds that activate MOR215-1 in mouse preputial gland extracts.” Another study, which used histomorphology, metabolomic and transcriptomic analyses to compare the mouse preputial glands to muskrat scent glands, found that the two glands are similar in many ways, including molecular composition (Han et al., 2022). However, the study did not identify known musk compounds within mouse preputial glands.

      (2) Based on the reviewer’s feedback and our own curiosity, we used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk odorants, particularly those known to activate Olfr235 and Olfr1440 (Sato-Akuhara et al., 2016). Although we were unable to find evidence for known musk odorants in mouse urine extracts (possibly due to insufficient sensitivity of the assay employed), we found that preputial gland extracts contain GC-MS signals that are structurally consistent with known musk odorants. A limitation of this approach, however, is that the conclusive identification of specific musk odorants in extracts derived from mouse urine and tissues requires comparisons to pure standards, many of which we could not readily obtain. For example, we were unable to obtain a pure sample of cycloheptadecanol, a musk molecule with a predicted potential match to a signal identified within preputial gland extracts. Another limitation is that although several known musk odorants have been found to activate Olfr235 and Olfr1440 OSNs, it is conceivable that structurally distinct odorants that have not yet been identified might also activate them. The findings from these experiments have been included in a new figure within the revised manuscript (Appendix 2–figure 1).

      Related: If these are male-specific cues, it is interesting that changes in OR transcripts (Figure 1) can already be seen at the age of P28 where other male-specific cues are just starting to get expressed. This should be discussed.

      We agree that the observed changes in quantities of newborn OSNs of musk-responsive subtypes in mice exposed to juvenile male odors deserves additional discussion. We have included a more extensive discussion of this observation in both the Results and Discussion sections of the revised manuscript.

      (4) Figure 5: Under muscone exposure the number of newborn neurons on the closed sides fluctuates considerably. This doesn't seem to be the case in other experiments and raises some concerns about how reliable the naris occlusion works for strong exposure to monomolecular odors or what other potential mechanisms are at play.

      We agree that the variability in quantities of newborn OSNs of musk-responsive subtypes on the closed side of the OE of UNO-treated mice deserves further discussion. As noted above, we suspect that these fluctuations are due, at least in part, to transnasal and/or retronasal odor transfer via the nasopharyngeal canal (Kelemen, 1947) and nasopharynx, respectively [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed OE to odor concentrations that rise with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 and Olfr1440 OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440) (Figure 4C-middle, Figure 4–figure supplement 1A-middle). It is conceivable that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone reflect overstimulation-dependent reductions in survival. Our findings from UNO-based experiments are consistent with expectations that naris occlusion does not completely block exposure to odorants on the closed side, particularly at high concentrations. However, they also appear consistent with the hypothesis that exposure to musk odors promotes the neurogenesis of musk-responsive OSN subtypes.

      Considering the limitations of the UNO procedure, it is important to note that the present study also includes experimental exposure of non-occluded animals to both male odors (Figure 3G, H) and exogenous musk odorants (Figures 4D-F). Findings from the latter experiments provide strong evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included within the Results section of the revised manuscript a discussion of how observed effects of muscone exposure of UNO-treated mice may be influenced by transnasal/ retronasal odor transfer to the closed side of the OE.

      (5) In contrast to all other musk-responsive OSN types, the number of newborn OSNs expressing olfr1437 increases on the closed side of the OE relative to the open in UNO-treated male mice (Figure 1). This seems to contradict the presented theory and also does not align with the bulk RNAseq data (Figure S1).

      Subtype Olfr1437 is indeed an outlier among musk-responsive subtypes that were previously found to be more highly represented in the OSN population in 6-month-old sex-separated males compared to females (Appendix 1–figure 1)(C. van der Linden et al., 2018; Vihani et al., 2020). Somewhat unexpectedly, our findings from scRNA-seq experiments show slightly greater quantities of immature Olfr1437 OSNs on the closed side of the OE in juvenile males (Figure 1D, E of the revised manuscript, which now includes data from a second OE). Perhaps more informatively considering the small number of iOSNs of specific subtypes in the scRNA-seq datasets, EdU birthdating experiments show no difference in newborn Orlfr1437 OSN quantities on the 2 sides of the OE from UNO-treated juvenile males (Figure 2G). It is unclear to us why subtype Olfr1437 does not show open-side biases in newborn OSN quantities in juvenile male mice, but potential explanations include:

      - Age: Findings based on bulk RNA-seq that musk responsive OSN subtypes are more highly represented in mice exposed to male odors analyzed mice that were 6 months old (C. van der Linden et al., 2018) or > 9 months old (Vihani et al., 2020) at the time of analysis. By contrast, the present study primarily analyzed mice that were juveniles (PD 28) at the time of scRNA-seq analysis (Figure 1) or EdU labeling (Figure 2G). It is conceivable that different musk-responsive subtypes are selectively responsive to distinct odors that are emitted at different ages. In this scenario, odors that increase the birthrates of Olfr235, Olfr1440, and Olfr1431 OSNs may be emitted starting at the juvenile stage, while those that increase the birthrate of Olfr1437 OSNs may be emitted in adulthood. In potential support of this, juvenile males exposed to their adult parents at the time of EdU labeling showed a slightly greater (although not statistically significantly different) UNO effect size in quantities of newborn Olfr1437 OSNs compared to controls (Figure 3–figure supplement 3).

      - Capacity for stimulation-dependent neurogenesis: It is also conceivable that, unlike other musk-responsive OSN subtypes, Olfr1437 OSNs lack the capacity for stimulation-dependent neurogenesis (like the SBT-responsive subtype Olfr912, for example). If so, this would imply that increased representations of Olfr1437 OSNs observed in mice exposed to male odors for long periods (C. van der Linden et al., 2018; Vihani et al., 2020) may be due to male odor-dependent increases in the lifespans of Olfr1437 OSNs.

      Within the Discussion section of the revised manuscript, we have discussed the findings concerning Olfr1437.

      (6) The authors hypothesize in relation to the accelerated birthrate of musk-responsive OSN subtypes that "the acceleration of the birthrates of specific OSN subtypes could selectively enhance sensitivity to odors detected by those subtypes by increasing their representation within the OE". However, for two other OSN subtypes that detect male-specific odors, they hypothesize the opposite "By contrast, Olfr912 (Or8b48) and Olfr1295 (Or4k45), which detect the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT) and (methylthio)methanethiol (MTMT), respectively, exhibited lower representation and/or transcript levels in mice exposed to male odors, possibly reflecting reduced survival due to overstimulation."

      Without any further explanation, it is hard to comprehend why exposure to male-derived odors should, on one hand, accelerate birthrates in some OSN subtypes to potentially increase sensitivity to male odors, but on the other hand, lower transcript levels and does not accelerate birth rates of other OSN subtypes due to overstimulation.

      We agree that this point deserves further explanation. Within the revised manuscript, we have expanded the Introduction and Results to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. In one study (C. J. van der Linden et al., 2020), UNO treatment was found to cause a fraction of OSN subtypes to exhibit lower birthrates and representations on the closed side of the OE relative to the open. By contrast, another fraction of OSN subtypes exhibited higher representations on the closed side of the OEs of UNO-treated mice, but no difference in birthrates between the two sides. The latter subtypes were found to be distinguished by their receipt of extremely high levels of odor stimulation, suggesting that reduced odor stimulation via naris occlusion may lengthen their lifespans. In support of the possibility that Olfr912 (and Olfr1295), which detect SBT and MTMT, respectively (Vihani et al., 2020), which are emitted specifically by male mice (Lin et al., 2005; Schwende et al., 1986), UNO treatment was previously found to increase total Olfr912 OSN quantities on the closed side compared to the open side in sex-separated males (C. van der Linden et al., 2018), a finding confirmed in the present study (Figure 3–figure supplement 1H).

      Taken together, findings from previous studies as well as the current one indicate that olfactory stimulation can accelerate the birthrates and/or reduced the lifespans of OSNs, depending on the specific subtypes and odors within the environment. As we have now indicated in the Discussion, we do not yet know what distinguishes subtypes that undergo stimulation-dependent neurogenesis, but it is conceivable that they detect odors with a particular salience to mice. Thus, observations that some odorants (e.g., musks) cause stimulation-dependent neurogenesis while others do not (e.g., SBT) might reflect an animal’s specific need to adapt its sensitivity to the former. Alternatively, it is conceivable that stimulation-dependent reductions in representations of subtypes such as Olfr912 and Olfr1295 reflect a fundamentally different mode of plasticity that is also adaptive, as has been hypothesized (C. van der Linden et al., 2018; Vihani et al., 2020).

      Reviewer #1 (Recommendations For The Authors):

      To support the main claim, several controls are necessary as mentioned under point 1 of the public review.

      As outlined in our responses to the public review, new experiments within the revised manuscript indicate the following:

      (1) Accelerated birthrates of 3 different musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) are observed in non-occluded mice following exposure to multiple exogenous musk odorants (muscone, ambretone) (Figure 4D-F).

      (2) Exposure of non-occluded mice to non-musk odors (SBT, IAA) does not accelerate the birthrates of musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) (Figure 4D-F).

      (3) Exposure of mice to exogenous musk odors (muscone, ambretone) does not accelerate the birthrates of non-musk responsive OSN subtypes (e.g., Olfr912), including those previously found to undergo stimulation-dependent neurogenesis (Olfr827, Olfr1325) (Figure 4–figure supplement 4C).

      (4) Only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them (e.g., Olfr912 birthrates are not accelerated by SBT exposure) (Figure 4–figure supplement 4C-left).

      In addition, this study could be considerably improved by showing that the proposed mechanism applies beyond a single OSN subtype (olfr235), especially since the most sensitive OR subtype (expressing olfr1440) does not align with the main claim. The introduction states that this is difficult because the ligands for many ORs are unknown including all subtypes previously found to undergo stimulation-dependent neurogenesis referring to your 2020 study. While this reviewer agrees that the lack of deorphanization is a significant hurdle in the field, the 2020 study states that about 4% of all ORs (which should equal >40 ORs) show a stimulus-dependent down-regulation on the closed side, not only the 7 ORs which are closer examined (Figure 1). It would tremendously improve the impact of the current study to show that the proposed effect applies also to one of these other >40 ORs.

      We appreciate this question, as it alerted us to some shortcomings in how our findings were presented within the original manuscript. We respectfully disagree that only findings regarding subtype Olfr235 align with the main hypothesis of this study, which is that discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate. Specifically, we would like to draw attention to experiments on non-occluded female mice exposed to exogenous musk odorants (muscone, ambretone; revised Figures 4D-F; previously, Figure 6). Findings from these experiments provide compelling evidence that exposure to musk odorants causes selective increases in the birthrates of three different musk-responsive OSN subtypes: Olfr235, Olfr1440, and Olfr1431. Thus, we would suggest that results from the present study already show that the proposed mechanism applies to more than the just Olfr235 subtype. However, we agree with what we think is the essence of the reviewer’s point: that it is important to determine the extent to which this mechanism applies to OSN subtypes that are responsive to other (i.e., non-musk) odorants. While, as noted by the reviewer, our previous study identified several OSN subtypes that undergo stimulation-dependent neurogenesis (as well as many others that predicted to do so)(C. J. van der Linden et al., 2020), we are not aware of ligands that have been identified with high confidence for those subtypes. Although we are in the process of conducting experiments to identify additional odor/subtype pairs to which the mechanism described in this study applies, the early-stage nature of these experiments precludes their inclusion in the present manuscript.

      The ethological and mechanistic relevance of the current study could be significantly improved by showing that musk-related odors that activate olfr235 are actually found in male mouse urine (and additionally are not found in female mouse urine). Otherwise, the implicated link between the acceleration of OSN birthrates by exposure to male odors and acceleration by specific monomolecular odors does not hold, raising the question of any natural relevance (e.g. the proposed adaptive function to increase sensitivity to certain odors).

      As noted in our responses to the public review, we have addressed this important point within the revised manuscript as follows:

      (1) We have included an extensive discussion of what is known about the emission of musk-like odors by mice.

      (2) We have used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk compounds. Although inconclusive, we report that preputial glands contain signals that are structurally consistent with known musk compounds. The findings of these experiments have been included in the revised manuscript (new Appendix 2–figure 1), along with a discussion of their limitations.

      Reviewer #2 (Public Review):

      In their paper entitled "In mice, discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate" Hossain et al. address lifelong neurogenesis in the mouse main olfactory epithelium. The authors hypothesize that specific odorants act as neurogenic stimuli that selectively promote biased OR gene choice (and thus olfactory sensory neuron (OSN) identity). Hossain et al. employ RNA-seq and scRNA-seq analyses for subtype-specific OSN birthdating. The authors find that exposure to male and musk odors accelerates the birthrates of the respective responsive OSNs. Therefore, Hossain et al. suggest that odor experience promotes selective neurogenesis and, accordingly, OSN neurogenesis may act as a mechanism for long-term olfactory adaptation.

      We appreciate this summary but would like to underscore that a mechanism involving biased OR gene choice is just one of two possibilities proposed in the Discussion section to explain how odorant stimulation of specific subtypes accelerates the birthrates of those subtypes.

      The authors follow a clear experimental logic, based on sensory deprivation by unilateral naris occlusion, EdU labeling of newborn neurons, and histological analysis via OR-specific RNA-FISH. The results reveal robust effects of deprivation on newborn OSN identity. However, the major weakness of the approach is that the results could, in (possibly large) parts, depend on "downregulation" of OR subtype-specific neurogenesis, rather than (only) "upregulation" based on odor exposure. While, in Figure 6, the authors show that the observed effects are, in part, mediated by odor stimulation, it remains unclear whether deprivation plays an "active" role as well. Moreover, as shown in Figure 1C, unilateral naris occlusion has both positive and negative effects in a random subtype sample.

      In our view, the present study involves two distinct and complementary experimental designs: 1) odor exposure of UNO-treated animals and 2) odor exposure of non-occluded animals. Here we address this comment with respect to each of these designs:

      (1) For experiments performed on UNO-treated animals, we agree that observed differences in birthrates on the open and closed sides of the OE reflect, largely, a deceleration (i.e., downregulation) of the birthrates of these subtypes on the closed side relative to the open (as opposed to an acceleration of birthrates on the open side). Our objective in using this design was to test the extent to which specific OSN subtypes undergo stimulation-dependent neurogenesis under various odor exposure conditions. According to the main hypothesis of this study, a lower birthrate of a specific OSN subtype on the closed side of the OE compared to the open is predicted to reflect a lower level of odor stimulation on the closed side received by OSNs of that subtype. However (and as described in our responses to reviewer #1), a limitation of this design is that environmental odorants, especially at high concentrations, are likely to stimulate responsive OSNs on the closed side of the OE in addition to the open side due to transnasal and/or retronasal air flow.

      (2) Experiments performed on non-occluded animals were designed to provide critical complementary evidence that specific OSN subtypes undergo accelerated neurogenesis in the presence of specific odors. Using this design, we have found compelling evidence that:

      - Exposure of non-occluded mice to male odors causes the selective acceleration of the birthrate of Olfr235 OSNs (Figure 3G, H).

      - Exposure of non-occluded female mice to two different musk odorants (muscone and ambretone) selectively accelerates the birthrates three different musk responsive subtypes: Olfr235, Olfr1440, and Olf1431 (Figure 4D-F and Figure 4–figure supplement 4C).

      We have reorganized the revised manuscript to more clearly present the most important experimental findings using these two experimental designs. We have also highlighted (via schematics) the experimental conditions (e.g., UNO, non-occlusion, odor exposure) used for each experiment.

      Another weakness is that the authors build their model (Figure 8), specifically the concept of selectivity, on a receptor-ligand pair (Olfr912 that has been shown to respond, among other odors, to the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT)) that would require at least some independent experimental corroboration. At least, a control experiment that uses SBT instead of muscone exposure should be performed.

      We agree that this important concern deserves additional control experiments and discussion. We have addressed this concern within the revised manuscript as follows:

      - Within the Results section, we have added multiple new control experiments (detailed in response to Reviewer #1), including the one recommended above. As suggested, we quantified newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT or unexposed controls. Exposure of SBT was found to cause no significant increase in quantities of newborn Olfr912 OSNs (newly added Figure 4–figure supplement 4C-left). These findings further support the model in Figure 7 (previously Figure 8) that only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them.

      - Also within the Results section, we have made efforts to better highlight relevant control experiments that were included in the original version, particularly those showing that quantities of newborn Olfr912 OSNs are not affected by UNO in mice exposed to male odors (Figure 2H and Figure 3–figure supplement 1G; previously Figure 2F and Figure 3H) or by exposure of non-occluded females to male odors (Figure 3H; previously Figure 6E). Since Olfr235 is responsive to component(s) of male odors (C. van der Linden et al., 2018; Vihani et al., 2020), these results indicate that this subtype does not have the capacity of stimulation-dependent neurogenesis, which is consistent with our previous findings that only a fraction of subtypes have this capacity (C. J. van der Linden et al., 2020).

      In this context, it is somewhat concerning that some results, which appear counterintuitive (e.g., lower representation and/or transcript levels of Olfr912 and Olfr1295 in mice exposed to male odors) are brushed off as "reflecting reduced survival due to overstimulation." The notion of "reduced survival" could be tested by, for example, a caspase3 assay.

      This is a point that we agree deserves further discussion. Please see the explanation that we have outlined above in response to Reviewer #1.

      Within the revised manuscript, we have expanded the Introduction to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. We outline evidence from previous studies that Olfr912 and Olfr1295 belong to the latter category, and that the representations of these subtypes are likely reduced by male odor overstimulation-dependent shortening of OSN lifespan.

      Important analyses that need to be done to better be able to interpret the findings are to present (i) the OR+/EdU+ population of olfactory sensory neurons not just as a count per hemisection, but rather as the ratio of OR+/EdU+ cells among all EdU+ cells; and (ii) to the ratio of EdU+ cells among all nuclei (UNO versus open naris). This way, data would be normalized to (i) the overall rate of neurogenesis and (ii) any broad deprivation-dependent epithelial degeneration.

      We have addressed this concern in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      Finally, the paper will benefit from improved data presentation and adequate statistical testing. Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH, are hard to interpret. Moreover, t-tests should not be employed when data is not normally distributed (as is the case for most of their samples).

      We have made extensive changes within the revised manuscript to increase the clarity and interpretability of the figures, including:

      (1) Addition of a split-channel, high-magnification view of a representative image that shows the overlap of FISH and EdU signals (Figure 2D).

      (2) Addition of experimental schematics and timelines corresponding to each set of experiments.

      In the revised manuscript, several changes to the statistical tests have been made, as follows:

      (1) To assess deviation from normality of the histological quantifications of newborn and total OSNs of specific subtypes in this study, all datasets were tested using the Shapiro-Wilk test for non-normality and the P values obtained are included in Supplementary file 1 (figure source data). Of the 274 datasets tested, 253 were found to have Shapiro-Wilk P values > 0.05, indicating that the vast majority (92%) do not show evidence of significant deviation from a normal distribution.

      (2) A general lack of deviation of the datasets in this study from a normal distribution is further supported by quantile-quantile (QQ) plots, which compare actual data to a theoretically normal distribution (Appendix 4–figure 1). The datasets analyzed were separated into the following categories:

      a. Quantities of newborn OSNs in UNO treated mice (Appendix 4-figure 1A)

      b. Quantities of total OSNs in UNO treated mice (Appendix 4-figure 1B)

      c. Quantities of newborn OSNs in non-occluded mice (Appendix 4-figure 1C)

      d. UNO effect sizes for newborn or total OSNs (Appendix 4-figure 1D)

      (3) Results of both parametric and non-parametric statistical tests of comparisons in this study have been included in Supplementary file 2 (statistical analyses). In general, the results from parametric and non-parametric tests are in good agreement.

      (4) Statistical analyses of differences in OSN quantities in the OEs of non-occluded mice or UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions have now been performed using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli.

      Reviewer #2 (Recommendations for the Authors):

      The manuscript by Hossain et al. would benefit from a thorough revision. Here, we outline several points that should be addressed:

      Figure 3E - I & Figure 4E&F: Red lines that connect mean values are misleading.

      Within the revised manuscript, the UNO effect size graphs have been modified for clarity, including removal of the lines between mean values except for those comparing changes over time post EdU injection (Figure 6 and Figure 6-figure supplement 1). For these latter graphs, we think that lines help to illustrate changes in effect sizes over time.

      Figure 3E - I: UNO effect sizes (right) should be tested via ANOVA.

      In the revised manuscript, statistical analyses of UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions were done using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (Figure 2-figure supplement 2; Figure 3; Figure 3-figure supplement 1; Figure 4; Figure 4-figure supplements 1, 2). The same tests were used for analysis of differences in OSN quantities in the OEs of non-occluded mice subjected more than two different experimental conditions (Figure 3; Figure 3-figure supplement 2; Figure 4; Figure 4-figure supplements 3, 4). For comparisons of differences in quantities of newborn OSNs of musk-responsive subtypes at 4 and 7 days post-EdU between non-occluded mice exposed and unexposed to muscone, a two sample ANOVA - fixed-test, using F distribution (right-tailed) was used (Figure 6; Figure 6-figure supplement 1).

      Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH: Colabeling is hard / often impossible to discern. Show zoom-ins and better explain the criteria for "colabeling" in the methods.

      In the revised manuscript an enlarged and split-channel view of an image showing multiple newborn Olfr235 OSNs (OR+/EdU+) has been added (Figure 2D). A detailed description of the criteria for OR+/EdU+ OSNs is provided in Methods under the section “Histological quantification of newborn and total OSNs of specific subtypes.”

      Figure 1C: add Olfr912.

      As a control group for iOSN quantities of musk-responsive subtypes in Figure 1, we selected random subtypes that are expressed in the same zones: 2 and 3. Olfr912 OSNs were not included because this subtype was not randomly chosen, nor is it expressed the same zones (Olfr912 is expressed in zone 4). We also note that the scRNA-seq analysis was done to allow an initial exploration of the hypothesis that some OSN subtypes with that are more highly represented in mice exposed to male odors show stimulation-dependent neurogenesis. Considering that the scRNA-seq datasets contain only small numbers of iOSNs of specific subtypes, we think they are more useful for analyzing changes in birthrates within groups of subtypes (e.g., musk responsive, random) rather than individual subtypes.

      The time of OE dissection is different for data shown in Figure 1 (P28) as compared to other figures (P35). Please comment/discuss.

      Within the Results section of the revised manuscript, we have now clarified that the PD 28 timepoint chosen for EdU birthdating in the histological quantification of newborn OSNs of specific subtypes is analogous to the PD 28 timepoint chosen for identification of immature (Gap43-expressing) OSNs in the scRNA-seq samples. In the case of EdU birthdating, it is necessary to provide a chase period of sufficient length to enable robust and stable expression of an OR, which defines the subtype. A chase period of 7 days was chosen based on a previous study (C. J. van der Linden et al., 2020). Hence, a dissection date of PD 35 was chosen.

      Figure 3F&G: please discuss the female à female effects

      In the Results and Discussion sections of the revised manuscript, we discuss our observation that the Olfr1440 and Olfr1431 subtypes show significantly higher quantities of newborn OSNs on the open side compared to closed sides in UNO-treated females. We speculate that these subtypes may receive some odor stimulation in juvenile females, perhaps via musk or related odors emitted by females themselves or from elsewhere within the environment.

      Figure 4E (and other examples): male à male displays two populations (no effect versus effect); please explain/speculate.

      For some UNO effect sizes, there appears to be high degree of variation among mice, and, in some cases, this diversity appears to cause the data to separate into groups. We assessed whether this diversity might reflect mice that came from different litters, but this is not the case. Rather, we speculate that the observed diversity most likely reflects low representations of newborn OSNs of some subtypes and/or under specific conditions. The data referred to by the reviewer (now Figure 3–figure supplement 3D), for example, shows UNO effect sizes for quantities of newborn Olfr1431 OSNs, which has the lowest representation among the musk-responsive subtypes analyzed in this study.

      Figure 5C-E: It is unclear why strong muscone concentrations (10%) have no effect, whereas no muscone sometimes (D&E) has an effect.

      As discussed in response to comments from Reviewer #1, we speculate that fluctuations in UNO effect sizes in muscone-exposed mice, particularly at high muscone concentrations, may be due, at least in part, to transnasal and/or retronasal air flow [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed side of the OE to muscone concentrations that increase with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 (Figure 4C-middle) and Olfr1440 (Figure 4–figure supplement 1A-middle) OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440). We speculate that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone may reflect overstimulation-dependent reductions in survival.

      As emphasized above, our study also includes experiments on non-occluded animals (Figures 3, 4, 5). Findings from these experiments provide additional evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included an extensive interpretation of UNO-based experiments, including their limitations, within the Results section of the revised manuscript.

      Figure S1: please explain the large error bars regarding "Transcript level".

      We have clarified that the error bars in this figure, which is now Appendix 1–figure 1, correspond to 95% confidence intervals.

      The figure captions could be improved for ease of reading.

      Figure captions have been revised for increased clarity.

      Figure 4: Include Olfr235 data for consistency.

      All OSN subtypes analyzed for the effects of exposure to adult mice on UNO-induced open-side biases in quantities of newborn OSNs have been included in a single figure, which is now Figure 3–figure supplement 3.

      Figure S6F&G: Do not run statistics on n = 2 (G) or 3 (F) samples.

      We have removed statistical test results from comparisons involving fewer than 4 observations.

      Reviewer #3 (Public Review):

      Summary:

      Neurogenesis in the mammalian olfactory epithelium persists throughout the life of the animal. The process replaces damaged or dying olfactory sensory neurons. It has been tacitly that replacement of the OR subtypes is stochastic, although anecdotal evidence has suggested that this may not be the case. In this study, Santoro and colleagues systematically test this hypothesis by answering three questions: is there enrichment of specific OR subtypes associated with neurogenesis? Is the enrichment dependent on sensory stimulus? Is the enrichment the result of differential generation of the OR type or from differential cell death regulated by neural activity? The authors provide some solid evidence indicating that musk odor stimulus selectively promotes the OR types expressing the musk receptors. The evidence argues against a random selection of ORs in the regenerating neurons.

      Strengths:

      The strength of the study is a thorough and systematic investigation of the expression of multiple musk receptors with unilateral naris occlusion or under different stimulus conditions. The controls are properly performed. This study is the first to formulate the selective promotion hypothesis and the first systematic investigation to test it. The bulk of the study uses in situ hybridization and immunofluorescent staining to estimate the number of OR types. These results convincingly demonstrate the increased expression of musk receptors in response to male odor or muscone stimulation.

      Weaknesses:

      A major weakness of the current study is the single-cell RNASeq result. The authors use this piece of data as a broad survey of receptor expression in response to unilateral nasal occlusion. However, several issues with this data raise serious concerns about the quality of the experiment and the conclusions. First, the proportion of OSNs, including both the immature and mature types, constitutes only a small fraction of the total cells. In previous studies of the OSNs using the scRNASeq approach, OSNs constitute the largest cell population. It is curious why this is the case. Second, the authors did not annotate the cell types, making it difficult to assess the potential cause of this discrepancy. Third, given the small number of OSNs, it is surprising to have multiple musk receptors detected in the open side of the olfactory epithelium whereas almost none in the closed side. Since each OR type only constitutes ~0.1% of OSNs on average, the number of detected musk receptors is too high to be consistent with our current understanding and the rest of the data in the manuscript. Finally, unlike the other experiments, the authors did not describe any method details, nor was there any description of quality controls associated with the experiment. The concerns over the scRNASeq data do not diminish the value of the data presented in the bulk of the study but could be used for further analysis.

      We are grateful to the reviewer for raising these important questions.

      In the revised manuscript, we have clarified that the scRNA-seq dataset presented in the original version of the manuscript (now called dataset OE 1) was published and described in detail in a previous study (C. J. van der Linden et al., 2020). The reviewer is correct that the proportion of OSNs within that dataset was lower in that dataset than in other datasets that have been published more recently (using updated methods). We think this is likely because of the way that the cells were processed (e.g., from cryopreserved single cells followed by live/dead selection). However, because the open and closed sides were processed identically, we do not expect the ratios of OSNs of specific subtypes to be greatly affected. Hence, the differences observed for specific OSN subtypes on the open versus closed sides are expected to be valid.

      As the reviewer notes, there is a surprisingly large difference between the number of OSNs of musk-responsive subtypes on the open and closed sides within the OE 1 dataset. This difference is a key piece of information that led us to formulate the hypothesis in the study: that musk responsive subtypes are born at a higher rate in the presence of male/musk odor stimulation. And while it is true that, on average, each subtype represents ~0.1% of the population, it is known that there is wide variance in representations among different subtypes [e.g., (Ibarra-Soria et al., 2017)]. The frequencies of the musk responsive subtypes among all OSNs on the open side of OE 1 (0.3% for Olfr235, 0.4% for olfr1440, 0.06% for Olfr1434, 0% for olfr1431, and 1% for Olfr1437) are in line with previous findings.

      To confirm that the scRNA-seq findings from dataset OE 1 are not an artifact of the cell preparation methods used, we generated a second scRNA-seq dataset, OE 2, which has been added to the revised manuscript (Figure 1). The OE 2 dataset was prepared according to the same experimental timeline as OE 1, but the cells were captured immediately after dissociation and live/dead sorting via FACS. As expected, most cells within OE 2 dataset are OSNs (77% on the open side, 66% on the closed). Importantly, like the OE 1 dataset, the OE 2 dataset shows higher quantities of iOSNs of musk responsive subtypes on the open side of the OE compared to the closed (normalized for either total cells or total OSNs) (Figure 1–figure supplement 1D, E).

      A weakness of the experiment assessing musk receptor expression is that the authors do not distinguish immature from mature OSNs. Immature OSNs express multiple receptor types before they commit to the expression of a single type. The experiments do not reveal whether mature OSNs maintain an elevated expression level of musk receptors.

      While it is established that multiple ORs are coexpressed at a low level during OSN differentiation (Bashkirova et al., 2023; Fletcher et al., 2017; Hanchate et al., 2015; Pourmorady et al., 2024; Saraiva et al., 2015; Scholz et al., 2016; Tan et al., 2015), this has been found to occur primarily at the immediate neuronal precursor 3 (INP3) stage (Bashkirova et al., 2023; Fletcher et al., 2017), which is characterized by expression of Tex15 (Fletcher et al., 2017; Pourmorady et al., 2024) and precedes the immature OSN (iOSN) stage, which is characterized by expression of Gap43 (Fletcher et al., 2017; McIntyre et al., 2010; Verhaagen et al., 1989). Within the scRNA-seq datasets in the present study, iOSNs of specific subtypes are identified based on robust expression of Gap43 (Log<sup>2</sup> UMI > 1) and a specific OR gene (Log<sup>2</sup> UMI > 2), as described in the figures and methods. Thus, the cells defined as iOSNs are expected to express a single OR gene and this expression should be maintained as iOSNs transition to mOSNs. To confirm these predictions, we carried out a detailed analysis of OR expression at three different stages of OSN differentiation: INP3, iOSN, and mOSN (Figure 1–figure supplement 2). The cells chosen for analysis express the musk-responsive ORs Olfr235 or Olfr1440 or a randomly chosen OR Olfr701, in addition to markers that define INP3, iOSN, or mOSN cells. As expected, individual iOSNs and mOSNs of musk-responsive subtypes were found to exhibit robust and singular OR expression on the open and closed sides of OEs from UNO-treated mice. Moreover, and as observed previously, INP3 cells coexpress multiple OR transcripts at low levels. A detailed description of how the analysis was performed is included in the Methods section under Quantification and statistical analysis.

      Within the histology-based quantifications, newborn OSNs are identified based on their robust RNA-FISH signals corresponding to a specific OR transcript and an EdU label. Considering the EdU chase time of 7 days, most EdU-positive cells are expected to have passed the INP3 stage and be iOSNs or mOSNs. Moreover, considering the low level of OR expression within INP3 cells, it is unlikely OR transcripts are expressed at a high enough level to be detectable and/or counted at this stage and thereby affect newborn OSN quantifications.

      There are also two conceptual issues that are of concern. The first is the concept of selective neurogenesis. The data show an increased expression of musk receptors in response to male odor stimulation. The authors argue that this indicates selective neurogenesis of the musk receptor types. However, it is not clear what the distinction is between elevated receptor expression and a commitment to a specific fate at an early stage of development. As immature OSNs express multiple receptors, a likely scenario is that some newly differentiated immature OSNs have elevated expression of not only the musk receptors but also other receptors. The current experiments do not distinguish the two alternatives. Moreover, as pointed out above, it is not clear whether mature OSNs maintain the increased expression. Although a scRNASeq experiment can clarify it, the authors, unfortunately, did not perform an in-depth analysis to determine at which point of neurogenesis the cells commit to a specific musk receptor type. The quality of the scRNASeq data unfortunately also does not lend confidence for this type of analysis.

      The addition of a second scRNA-seq dataset within the revised manuscript (Figure 1), combined with the new scRNA-seq-based analyses of OR expression in INP3, iOSN, and mOSN cells (Figure 1-figure supplement 2), provide strong evidence that iOSNs and mOSNs robustly express a single OR gene and that cellular expression is stable from the iOSN to the mOSN stage. These analyses do not support a scenario in which odor stimulation causes upregulated expression of multiple ORs and thereby causes apparent increases in quantities of newly generated OSNs that express musk-responsive ORs. Rather, the data firmly support a mechanism in which odor stimulation increases quantities of newly generated OSNs that have stably committed to the robust expression of a single musk-responsive OR.

      A second conceptual issue, the idea of homeostasis in regeneration, which the authors presented in the Introduction, needs clarification. In its current form, it is confusing. It could mean that a maintenance of the distribution of receptor types, or it could mean the proper replacement of a specific OR type upon the loss of this type. The authors seem to refer to the latter and should define it properly.

      We have revised the Introduction section to clarify our use of the term homeostatic in one instance (paragraph 4) and replace it with more specific language in a second instance (paragraph 5).

      Reviewer #3 (Recommendations For The Authors):

      Concerns over scRNASeq data. It appears that the samples may have included non-OE tissues, which reduced the representation of the OSNs. This experiment may need to be repeated to increase the number of OSNs.

      As outlined in the response to the public comments, we think that the low proportion of OSNs in the OE 1 data set reflects how the cells were prepared and processed. We have now included a second scRNA-seq dataset to address this concern.

      Cell types should be identified in the scRNASeq analysis, and the number of cells documented for each cell type, at least for the OSNs. The data should be made available for general access.

      We have now clarified that the OE 1 dataset was published as part of a previous study (C. J. van der Linden et al., 2020) and was made publicly available as part of that study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157119). All cell types in the newly generated OE 2 dataset have been annotated (Figure 1) and this dataset has also been made publicly available (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE278693). The numbers and percentages of OSNs within OE 1 and OE 2 datasets have been added to the legend of Figure 1-figure supplement 1.

      The specific OR types should be segregated for mature and immature OSNs. The percentage of a specific OR type should be normalized to the total number of OSNs, rather than the total cells. The current quantification is misleading because it gives the false sense that the muscone receptors represent ~0.1% of cells when the proportion is much higher if only OSNs are considered.

      In the revised manuscript, quantities of iOSNs (Gap43+ cells) of specific subtypes within the OE 1 and OE 2 scRNA-seq datasets are graphed as percentages of both all OSNs (Figure 1E, Figure 1–figure supplement 1D) and all cells (Figure 1–figure supplement 1E). As a percentage of all OSNs, average quantities of iOSNs of musk responsive subtypes on the open side of the OE range from 0.005% (for Olfr1431) to 0.14% (for Olfr1440) (Figure 1E).

      Within the feature plots for the two datasets, the differentiation stages of indicated OSNs have been clearly defined within the figures and figure legends. For the OE 1 dataset, iOSNs are differentiated from mOSNs by arrows (Figure 1–figure supplement 1C). For the OE 2 dataset (Figure 1D), only immature OSNs are shown for simplicity.

      Technical details of the scRNASeq should be documented. In the feature plot of musk-response receptors (Figure. 1D), it is better to use the actual quantity of expression rather than binarized representation (with or without an OR). If one needs to use on/off to determine the number of cells for a given OR type, then the criteria of selection should be given.

      Technical details of generation of the scRNA-seq datasets have been documented in the “Method details” section (for the OE 2 dataset) and in the method section of our previous publication of the OE 1 dataset (C. J. van der Linden et al., 2020). Details of the scRNA-seq analyses, including the criteria used to define immature OSNs of specific subtypes, are documented within the “Quantification and statistical analysis” section.

      Within the feature plots, we have decided to show OSNs of a given subtype in a binary fashion using specific colors for the sake of simplicity (Figure 1D, Figure 1-figure supplement 1C). To address the reviewer’s cooncern, we have added a new figure that provides detailed information about OR transcript expression (levels and genes) within iOSNs and mOSNs of two different musk responsive subtypes and a randomly chosen subtype (Figure 1-figure supplement 2).

      An in-depth analysis of the onset of OR expression in the GBC, INP, immature, and mature OSNs should be performed. It is also important to determine how many other receptors are detected in the cells that express the musk receptors. The current scRNASeq data may not be of sufficiently high quality and the experiment needs to be repeated. It is also important for the authors to take measures to eliminate ambient RNA contamination.

      The revised manuscript includes a second scRNA-seq dataset (OE 2; Figure 1). Details of how both the original (OE 1) and new datasets were generated have been documented within the Methods sections of the corresponding publications [(C. J. van der Linden et al., 2020); present study]. For both datasets, live/dead selection of cells was performed, which was expected to reduce ambient RNA.

      The revised manuscript also includes a new figure that provides detailed information about OR transcript expression within INP3, iOSN and mOSN cells that express one of two different musk responsive ORs or a randomly chosen OR (Figure 1-figure supplement 2). These data reveal, as reported previously (Bashkirova et al., 2023; Fletcher et al., 2017; Pourmorady et al., 2024), that low levels of multiple OR transcripts are detected in INP3 (Tex15+) cells. By contrast, iOSN (Gap43+) and mOSN (Omp+) cells robustly express a single OR, with little or no expression of other ORs.

      Quantification of cells for Figure 2-7 should be changed. Instead of using cell number per 1/2 section, the data should be calculated using density (using the area of the epithelium or normalized to the total number of cells (based on DAPI staining). This is because multiple sections are taken from the same mouse along the A-P axis. These sections have different sizes and numbers of cells.

      As noted in response to a similar concern of Reviewer #2, this has been addressed in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      References

      Bashkirova, E. V., Klimpert, N., Monahan, K., Campbell, C. E., Osinski, J., Tan, L., Schieren, I., Pourmorady, A., Stecky, B., Barnea, G., Xie, X. S., Abdus-Saboor, I., Shykind, B. M., Marlin, B. J., Gronostajski, R. M., Fleischmann, A., & Lomvardas, S. (2023). Opposing, spatially-determined epigenetic forces impose restrictions on stochastic olfactory receptor choice. eLife, 12, RP87445. https://doi.org/10.7554/eLife.87445

      Coppola, D. M. (2012). Studies of olfactory system neural plasticity: The contribution of the unilateral naris occlusion technique. Neural Plasticity, 2012, 351752. https://doi.org/10.1155/2012/351752

      Fletcher, R. B., Das, D., Gadye, L., Street, K. N., Baudhuin, A., Wagner, A., Cole, M. B., Flores, Q., Choi, Y. G., Yosef, N., Purdom, E., Dudoit, S., Risso, D., & Ngai, J. (2017). Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. Cell Stem Cell, 20(6), 817-830.e8. https://doi.org/10.1016/j.stem.2017.04.003

      Han, X., Jiang, Y., Feng, N., Yang, P., Zhang, M., Jin, W., Zhang, T., Huang, Z., Zhao, H., Zhang, K., Liu, S., & Hu, D. (2022). Comparison of the Homology Between Muskrat Scented Gland and Mouse Preputial Gland. Journal of Mammalian Evolution, 29(2), 435–446. https://doi.org/10.1007/s10914-022-09604-w

      Hanchate, N. K., Kondoh, K., Lu, Z., Kuang, D., Ye, X., Qiu, X., Pachter, L., Trapnell, C., & Buck, L. B. (2015). Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science (New York, N.Y.), 350(6265), 1251–1255. https://doi.org/10.1126/science.aad2456

      Hossain, K., Smith, M., & Santoro, S. W. (2023). A histological protocol for quantifying the birthrates of specific subtypes of olfactory sensory neurons in mice. STAR Protocols, 4(3), 102432. https://doi.org/10.1016/j.xpro.2023.102432

      Ibarra-Soria, X., Nakahara, T. S., Lilue, J., Jiang, Y., Trimmer, C., Souza, M. A., Netto, P. H., Ikegami, K., Murphy, N. R., Kusma, M., Kirton, A., Saraiva, L. R., Keane, T. M., Matsunami, H., Mainland, J., Papes, F., & Logan, D. W. (2017). Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife, 6. https://doi.org/10.7554/eLife.21476

      Kelemen, G. (1947). The junction of the nasal cavity and the pharyngeal tube in the rat. Archives of Otolaryngology, 45(2), 159–168. https://doi.org/10.1001/archotol.1947.00690010168002

      Lin, D. Y., Zhang, S.-Z., Block, E., & Katz, L. C. (2005). Encoding social signals in the mouse main olfactory bulb. Nature, 434(7032), 470–477. https://doi.org/10.1038/nature03414

      McIntyre, J. C., Titlow, W. B., & McClintock, T. S. (2010). Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons. Journal of Neuroscience Research, 88(15), 3243–3256. https://doi.org/10.1002/jnr.22497

      Pourmorady, A. D., Bashkirova, E. V., Chiariello, A. M., Belagzhal, H., Kodra, A., Duffié, R., Kahiapo, J., Monahan, K., Pulupa, J., Schieren, I., Osterhoudt, A., Dekker, J., Nicodemi, M., & Lomvardas, S. (2024). RNA-mediated symmetry breaking enables singular olfactory receptor choice. Nature, 625(7993), 181–188. https://doi.org/10.1038/s41586-023-06845-4

      Saraiva, L. R., Ibarra-Soria, X., Khan, M., Omura, M., Scialdone, A., Mombaerts, P., Marioni, J. C., & Logan, D. W. (2015). Hierarchical deconstruction of mouse olfactory sensory neurons: From whole mucosa to single-cell RNA-seq. Scientific Reports, 5, 18178. https://doi.org/10.1038/srep18178

      Sato-Akuhara, N., Horio, N., Kato-Namba, A., Yoshikawa, K., Niimura, Y., Ihara, S., Shirasu, M., & Touhara, K. (2016). Ligand Specificity and Evolution of Mammalian Musk Odor Receptors: Effect of Single Receptor Deletion on Odor Detection. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 36(16), 4482–4491. https://doi.org/10.1523/JNEUROSCI.3259-15.2016

      Scholz, P., Kalbe, B., Jansen, F., Altmueller, J., Becker, C., Mohrhardt, J., Schreiner, B., Gisselmann, G., Hatt, H., & Osterloh, S. (2016). Transcriptome Analysis of Murine Olfactory Sensory Neurons during Development Using Single Cell RNA-Seq. Chemical Senses, 41(4), 313–323. https://doi.org/10.1093/chemse/bjw003

      Schwende, F. J., Wiesler, D., Jorgenson, J. W., Carmack, M., & Novotny, M. (1986). Urinary volatile constituents of the house mouse,Mus musculus, and their endocrine dependency. Journal of Chemical Ecology, 12(1), 277–296. https://doi.org/10.1007/BF01045611

      Shirasu, M., Yoshikawa, K., Takai, Y., Nakashima, A., Takeuchi, H., Sakano, H., & Touhara, K. (2014). Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron, 81(1), 165–178. https://doi.org/10.1016/j.neuron.2013.10.021

      Tan, L., Li, Q., & Xie, X. S. (2015). Olfactory sensory neurons transiently express multiple olfactory receptors during development. Molecular Systems Biology, 11(12), 844. https://doi.org/10.15252/msb.20156639

      van der Linden, C. J., Gupta, P., Bhuiya, A. I., Riddick, K. R., Hossain, K., & Santoro, S. W. (2020). Olfactory Stimulation Regulates the Birth of Neurons That Express Specific Odorant Receptors. Cell Reports, 33(1), 108210. https://doi.org/10.1016/j.celrep.2020.108210

      van der Linden, C., Jakob, S., Gupta, P., Dulac, C., & Santoro, S. W. (2018). Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1), 5081. https://doi.org/10.1038/s41467-018-07120-1

      Verhaagen, J., Oestreicher, A. B., Gispen, W. H., & Margolis, F. L. (1989). The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 9(2), 683–691.

      Vihani, A., Hu, X. S., Gundala, S., Koyama, S., Block, E., & Matsunami, H. (2020). Semiochemical responsive olfactory sensory neurons are sexually dimorphic and plastic. eLife, 9, e54501. https://doi.org/10.7554/eLife.54501

    1. And his trails do not fade

      Trails will never fade

      4 - IndyWeb - TrailScape - TrailMarks - ClueTrails - HyperMaps

      of Individual, collaborative Trails blazed by Trail

      Eventually everything connects

      Just connect

      https://hypothes.is/users/gyuri?q=connections+key

      People Ideas and things

      Eventually everything connects — people, ideas, objects… the quality of the connections is the key to quality per se

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:

      It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      Thank you for your question regarding the role of CCK and NMDA receptors (NMDARs) in thalamocortical LTP. We propose that CCK receptor (CCKR) activation enhances intracellular calcium levels, which are crucial for thalamocortical LTP induction. Calcium influx through NMDARs is also essential to reach the threshold required for activating downstream signaling pathways that promote LTP (Heynen and Bear, 2001). Thus, CCKRs and NMDARs may function in a complementary manner to facilitate LTP, with both contributing to the elevation of intracellular calcium.

      However, it is important to note that the postsynaptic mechanisms of thalamocortical LTP in the auditory cortex (ACx) differ from those in other sensory cortices. Studies have shown that thalamocortical LTP in the ACx appears to be less dependent on NMDARs (Chun et al., 2013), which is distinct from somatosensory or visual cortices. Our previous studies also found that while NMDAR antagonists can block HFS-induced LTP in the inner ACx, LTP can still be induced in the presence of CCK even after the NMDARs blockade (Chen et al. 2019). These findings suggest that CCK may act through an alternative mechanism involving CCKR-mediated calcium signaling and AMPAR modulation, which partially compensates for the loss of NMDAR signaling. This distinction may reflect functional differences between the ACx and other sensory cortices, as highlighted in previous studies (King and Nelken, 2009).

      While our current study focuses on the role of CCKR-mediated plasticity in the auditory system, further investigations are needed to elucidate how CCKRs and NMDARs interact within the broader framework of thalamocortical neuroplasticity across different cortical regions. Understanding whether similar mechanisms operate in other sensory systems, such as the visual cortex, will be an important direction for future research.

      Heynen, A.J., and Bear, M.F. (2001). Long-term potentiation of thalamocortical transmission in the adult visual cortex in vivo. J Neurosci 21, 9801-9813. 10.1523/jneurosci.21-24-09801.2001.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      Chen, X., Li, X., Wong, Y.T., Zheng, X., Wang, H., Peng, Y., Feng, H., Feng, J., Baibado, J.T., Jesky, R., et al. (2019). Cholecystokinin release triggered by NMDA receptors produces LTP and sound-sound associative memory. Proc Natl Acad Sci U S A 116, 6397-6406. 10.1073/pnas.1816833116.

      King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701.

      (2) Complexity of the Thalamocortical System:

      The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      Thank you for your valuable feedback. We would like to clarify that stimulation was conducted in the medial geniculate nucleus ventral (MGv), and recording was performed in layer IV of the ACx. Targeting the MGv allows us to investigate the influence of thalamic inputs on auditory cortical responses. Layer IV of the ACx is known to receive direct thalamic projections, making it an ideal site for assessing how thalamic activity influences cortical processing. We will incorporate this clarification into the revised manuscript to enhance the robustness of our study.

      Results section:

      “Stimulation electrodes were placed in the MGB (specifically in the medial geniculate nucleus ventral subdivision, MGv), and recording electrodes were inserted into layer IV of ACx”

      “The recording electrodes were lowered into layer IV of ACx, while the stimulation electrodes were lowered into MGB (MGv subdivision). The final stimulating and recording positions were determined by maximizing the cortical fEPSP amplitude triggered by the ES in the MGB. The accuracy of electrode placement was verified through post-hoc histological examination and electrophysiological responses.”

      (3) Statistical Variability:

      Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      Thank you for your question. In our experiments, the sample size N represents the number of animals used, while n refers to the number of recordings, with each recording corresponding to a distinct stimulation and recording sites. To adhere to ethical guidelines and minimize animal usage, we often perform multiple recordings within a single animal, such as from different hemispheres of the brain. Although N may appear small, our statistical analyses are based on n, ensuring sufficient data points for reliable conclusions.

      Furthermore, as our experiments are conducted in vivo, we observe lower variability in the increase of fEPSP slopes following LTP induction compared to brain slice preparations, where standard deviations exceeding 50% of the mean are common. This reduced variability likely reflects the robustness of the physiologically intact conditions in the in vivo setup.

      (4) EYFP Expression and Virus Targeting:

      The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      Thank you for your question. In Figure 2A, EYFP expression indicates thalamocortical projections, while the co-expression of EYFP with PSD95 confirms the identity of thalamocortical terminals. The CCK-B receptors (CCKBR) are located on postsynaptic cortical neurons. The observed co-labeling of thalamocortical terminals and postsynaptic CCKBR suggests that CCK-expressing neurons in the medial geniculate body (MGB) can release CCK, which subsequently acts on the postsynaptic CCKBR. This evidence supports our interpretation of the functional role of CCK modulating neural plasticity between thalamocortical inputs and cortical neurons. As shown in Figure 2A, we aim to demonstrate that the co-labeling of thalamocortical terminals with CCK receptors accounts for a substantial proportion of the thalamocortical terminals. We will ensure that this clarification is emphasized in the revised manuscript to address your concerns.

      Results section:

      “Cre-dependent AAV9-EFIa-DIO-ChETA-EYFP was injected into the MGB of CCK-Cre mice. EYFP labeling marked CCK-positive neurons in the MGB. The co-expression of EYFP thalamocortical projections with PSD95 confirms the identity of thalamocortical terminals (yellow), which primarily targeted layer IV of the ACx (Figure 2A, upper panel). Immunohistochemistry revealed that a substantial proportion (15 out of 19, Figure 2A lower right panel) of thalamocortical terminals (arrows) colocalize with CCK receptors (CCKBR) on postsynaptic cortical neurons in the ACx (Figure 2A lower panel), supporting the functional role of CCK in modulating thalamocortical plasticity.”

      (5) Consideration of Previous Literature:

      A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      Thank you for your valuable feedback. We will enhance our discussion on auditory thalamocortical LTP during early development and adulthood to provide a more comprehensive context for our study.

      (6) Therapeutic Implications:

      While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

      Thank you for your thoughtful feedback. We agree that the therapeutic applications mentioned in our study are speculative at this stage and should be regarded as a forward-looking perspective rather than definitive conclusions. Our intention was to highlight the broader potential of our findings to inspire further research, rather than to propose immediate clinical applications.

      In light of your feedback, we have adjusted the language in the manuscript to reflect a more cautious interpretation. Speculative discussions are now explicitly framed as hypotheses or possibilities for future exploration. We emphasize that our findings provide a foundation for further investigations into CCK-based plasticity and its implications.

      We believe that appropriately framed forward-thinking discussions are valuable in guiding the direction of future research. We sincerely hope that our current and future work will contribute to a deeper understanding of thalamocortical plasticity and, over time, potentially lead to advancements in human health.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      Thank you for this insightful comment. We agree that the differential roles of PV-interneurons and pyramidal neurons in CCK-dependent thalamocortical plasticity remain unclear and acknowledge this as an important limitation of our study. Our primary focus was on pyramidal neurons, as our in vivo electrophysiological recordings measured the fEPSP slope in layer IV of the auditory cortex, which primarily reflects excitatory synaptic activity. However, we recognize the critical role of the excitatory-inhibitory balance in cortical function and the potential contribution of PV-interneurons to this process. In future studies, we plan to utilize techniques such as optogenetics, two-photon calcium imaging and cell-type-specific recordings to investigate the distinct contributions of PV-interneurons and pyramidal neurons to CCK-dependent thalamocortical plasticity, thereby providing a more comprehensive understanding of how CCK modulates thalamocortical circuits.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      Thank you for this thoughtful comment. We acknowledge that our study did not directly address the fidelity of temporal processing, which is indeed a critical aspect of auditory function. Our behavioral experiments primarily focused on linking frequency discrimination to the role of CCK in synaptic strengthening within the auditory thalamocortical pathway. However, we agree that enhanced responsivity of the system could also impact temporal processing dynamics, such as the precise timing of auditory responses. Whether this modulation improves or reduces the fidelity of temporal processing remains an open and important question.

      As you noted, understanding these dynamics will require a deeper investigation into the interactions between different cell types, particularly the balance between excitatory and inhibitory neurons. Exploring how CCK modulation affects both the circuit and cellular levels in temporal processing is an important direction for future research, which we plan to pursue. Thank you again for raising this important point.

      Disscusion section:

      “While we focused on homosynaptic plasticity at thalamocortical synapses by recording only fEPSPs in layer IV of ACx, it is essential to further explore heterosynaptic effects of CCK released from thalamocortical synapses on intracortical circuits, particularly its role in modulating the excitatory-inhibitory balance. PV-interneurons, as key regulators of cortical inhibition, may contribute to the temporal fidelity of sensory processing, which is critical for auditory perception (Nocon et al., 2023; Cai et al., 2018). Additionally, CCK may facilitate cross-modal plasticity by modulating heterosynaptic plasticity in interconnected cortical areas. Future studies would provide valuable insights into the broader role of CCK in shaping sensory processing and cortical network dynamics.”

      Nocon, J.C., Gritton, H.J., James, N.M., Mount, R.A., Qu, Z., Han, X., and Sen, K. (2023). Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Communications Biology 6, 751. 10.1038/s42003-023-05126-0.

      Cai, D., Han, R., Liu, M., Xie, F., You, L., Zheng, Y., Zhao, L., Yao, J., Wang, Y., Yue, Y., et al. (2018). A Critical Role of Inhibition in Temporal Processing Maturation in the Primary Auditory Cortex. Cereb Cortex 28, 1610-1624. 10.1093/cercor/bhx057.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      Thank you for your insightful comment. In our in vivo electrophysiological experiments on LTP induction, we recorded neural activity for over 1.5 hours to assess changes in neuronal responses over time, both prior to and following the induction. While single neuron firing data can provide valuable insights, such measurements are inherently more variable due to factors like cortical state fluctuations and the condition of nearby neurons, which makes them less reliable for long-term analysis. For this reason, we focused on fEPSP, as it offers a more stable and robust readout of synaptic activity over extended periods.

      We appreciate your suggestion and recognize the value of single-neuron data in understanding how CCK and HFS affect temporal processing and excitability. In future studies, we will consider to incorporate single-neuron analyses to complement our synaptic-level findings and provide a more comprehensive understanding of these mechanisms.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      Thank you for your comment. Data from the CCK-KO mice are presented in Figure 3A (far right) and in the upper panel of Figure 3B (far right). In the lower panel of Figure 3B, data from the CCK-KO group are not shown because the normalized values for this group were essentially zero, as expected due to the absence of CCK mRNA.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      Thank you for raising this important point. Pre-pulse inhibition (PPI) of the acoustic startle response indeed involves multiple brain regions, with the ascending auditory pathway playing a key role (Gómez-Nieto et al., 2020). Within the auditory cortex, layer IV neurons receive tonotopically organized inputs from the medial geniculate nucleus and are critical for integrating thalamic inputs and shaping auditory processing.

      In our behavioral experiments, mice were required to discriminate pre-pulses of varying frequencies against a continuous background sound. Given the role of auditory cortical neurons in integrating thalamic inputs and shaping auditory processing, it is likely that synaptic plasticity in these neurons contributes to the enhanced discrimination of pre-pulses. Supporting this idea, our previous work demonstrated that local infusion of CCK, paired with weak acoustic stimuli, significantly increased auditory responses in the auditory cortex (Li et al., 2014). In the current study, we further showed that CCK release during high-frequency stimulation of the thalamocortical pathway induced LTP in layer IV of the auditory cortex. Together, these findings suggest that CCK-dependent synaptic plasticity in layer IV may amplify the cortical representation of weak auditory inputs, thereby improving pre-pulses detection and enhancing PPI performance.

      It is also worth noting that aged mice with hearing loss typically exhibit PPI deficits due to impaired auditory processing (Ouagazzal et al., 2006 and Young et al., 2010). We propose that enhanced plasticity in the thalamocortical pathway, mediated by CCK, might partially compensate for these deficits by amplifying residual auditory signals in aged mice. However, the precise mechanisms by which layer IV synaptic plasticity modulates PPI behavior remain to be fully understood. Given the complex dynamics of sensory processing, future studies could explore how layer IV neurons interact with other cortical and subcortical circuits involved in PPI, as well as the specific contributions of excitatory and inhibitory cell types. These investigations will help provide a more comprehensive understanding of the role of CCK in modulating sensory gating and auditory processing.

      Gómez-Nieto, R., Hormigo, S., & López, D. E. (2020). Prepulse inhibition of the auditory startle reflex assessment as a hallmark of brainstem sensorimotor gating mechanisms. Brain sciences, 10(9), 639.

      Li, X., Yu, K., Zhang, Z., Sun, W., Yang, Z., Feng, J., Chen, X., Liu, C.-H., Wang, H., Guo, Y.P., and He, J. (2014). Cholecystokinin from the entorhinal cortex enables neural plasticity in the auditory cortex. Cell Research 24, 307-330. 10.1038/cr.2013.164.

      Ouagazzal, A. M., Reiss, D., & Romand, R. (2006). Effects of age-related hearing loss on startle reflex and prepulse inhibition in mice on pure and mixed C57BL and 129 genetic background. Behavioural brain research, 172(2), 307-315.

      Young, J. W., Wallace, C. K., Geyer, M. A., & Risbrough, V. B. (2010). Age-associated improvements in cross-modal prepulse inhibition in mice. Behavioral neuroscience, 124(1), 133.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) In Figure 1, the authors used different metrics for fEPSP strength. In Figure 1D, the authors used the slope, while they used the amplitude in Figure 1G. It is known that the two metrics are different from each other. While the slope is calculated from the linear regression between the voltage change per time of the rising phase of the fEPSP, the amplitude represents the voltage value of the fEPSP's peak. Please clarify here and in the method what metric you used, because the two terms are not interchangeable.

      Thank you for pointing out this oversight in our manuscript. We confirm that we used the slope of the fEPSP as the metric for assessing synaptic strength throughout the study, including both Figure 1D and Figure 1G. We will make the necessary corrections to ensure clarity and consistency. Thank you for bringing this to our attention.

      (2) It is not mentioned in the details of the methods about the CCK-KO mice. Please give such details. Although the authors used the CCK-KO mouse model as a control, I think that it is not a good choice to test the hypothesis mentioned in lines 165 and 166. The experiment was supposed to monitor the CCK-BR activity after HFS of the MGB and answer whether the CCK-BR will get activated by thalamic stimulation, but the CCK-KO mouse does not have CCK to be released after the optogenetic activation of the Chrimson probe. Therefore, it is expected to give nothing as if the experimenter runs an experiment without intervention. I think that the appropriate way to examine the hypothesis is to compare mice that were either injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato or AAV9-Syn-FLEX-tdTomato. However, CCK-OK would be a perfect model to confirm that LTP can be only generated dependently on CCK, by simply running the HFS of the MGB that would be associated with the cortical recording of the fEPSP. This also will rule out the assumption that the authors mentioned in lines 191 and 192.

      Thank you for your valuable feedback. The rationale behind our experimental design was to validate the newly developed CCK sensor and confirm its specificity. We aimed to verify CCK release post-HFS by comparing the responses of the CCK sensor in CCK-KO mice and CCK-Cre mice. This comparison allowed us to determine that the observed increase in fluorescence intensity post-HFS was specifically due to CCK release, rather than other neurotransmitters induced by HFS.

      We appreciate your suggestion to compare mice injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato and AAV9-Syn-FLEX-tdTomato, as it is indeed a valuable approach for directly testing the hypothesis regarding CCK-BR activation. However, we prioritized using the CCK-KO model to validate the CCK sensor's efficacy and specificity. The validation can be inferred by comparing the CCK sensor activity before and after HFS.

      Regarding concerns mentioned in lines 191 and 192 about potential CCK release from other projections via indirect polysynaptic activation, CCK-KO mice were not suitable for this aspect due to their global knockout of CCK. To address this limitation, we utilized shRNA to specifically down-regulate Cck expression in MGB neurons. This approach focused on the necessity of CCK released from thalamocortical projections for the observed LTP and effectively ruled out the possibility of indirect polysynaptic activation.

      We also acknowledge that the methods section lacked sufficient details about the CCK-KO mice, which may have caused confusion. In the revised methods section, we will add the following details:

      (1) The genotype of the CCK-KO mice used in this study (CCK-ires-CreERT2, Jax#012710).

      (2) A brief description of the CCK-KO validation, emphasizing the absence of CCK mRNA in these mice (as shown in Figure 3A and 3B).

      (3) The experimental purpose of using CCK-KO mice to validate the specificity of the CCK sensor.

      We believe these additions will clarify the rationale for using CCK-KO mice and their role in this study. Thank you again for highlighting these important points.

      (3) Figure 3C: The authors should examine if there is a difference in the baseline of fEPSPs across different age groups as the dependence on the normalization in the analysis within each group would hide if there were any difference of the baseline slope of fEPSP between groups which could be related to any misleading difference after HFS. Also, I wonder about the absence of LTP in P20, which is a closer age to the critical period. Could the authors discuss that, please?

      Thank you for your insightful feedback. To address your concern regarding baseline differences in fEPSP slopes across age groups, we conducted additional analysis. Baseline fEPSP across the three groups (P20, 8w, 18m), normalized to the 8w group, were 64.8± 13.1%, 100.0 ± 20.4%, and 58.8± 10.3%, respectively. While there was a trend suggesting smaller fEPSP slopes in the P20 and 18m groups compared to the young adult group, these differences were not statistically significant due to data variability (P20 vs. 8w, P = 0.319; 8w vs. 18m, P=0.147; P20 vs. 18m, P = 1.0, one-way ANOVA). These results suggest that baseline variability is unlikely to confound the observed differences in LTP after HFS. Furthermore, we ensured that normalization minimized any potential baseline effects.

      Regarding the absence of LTP in P20, this likely reflects developmental regulation of CCKBR expression in the auditory cortex (ACx). The HFS-induced thalamocortical LTP observed in our study is CCK-dependent and mechanistically distinct from the NMDA-dependent thalamocortical LTP during the critical period. Specifically, correlated pre- and postsynaptic activity can induce NMDA-dependent thalamocortical LTP only during an early critical period corresponding to the first several postnatal days, after which this pairing becomes ineffective starting from the second postnatal week (Crair and Malenka, 1995; Isaac et al., 1997; Chun et al., 2013). In contrast, the CCK-dependent Thalamocortical LTP induced by HFS is robust in adult mice but appears absent in P20, likely due to the lack of postsynaptic CCKBR expression in the ACx at this developmental stage.

      We will include these clarifications in the revised manuscript, particularly in the Discussion section, to provide a more comprehensive explanation of our findings. Thank you for your valuable comments and suggestions.

      Crair, M.C., and Malenka, R.C. (1995). A critical period for long-term potentiation at thalamocortical synapses. Nature 375, 325-328. 10.1038/375325a0.

      Isaac, J.T.R., Crair, M.C., Nicoll, R.A., and Malenka, R.C. (1997). Silent Synapses during Development of Thalamocortical Inputs. Neuron 18, 269-280. https://doi.org/10.1016/S0896-6273(00)80267-6.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      (4) Figure 4F: It is noticed that the baseline fEPSP of the CCK group and ACSF groups were different, which raises a concern about the baseline differences between treatment groups.

      Thank you for your valuable feedback and for pointing out this important detail. We apologize for any confusion caused by the presentation of the data. As noted in the figure legend, the scale bars for the fEPSPs were different between the left (0.1 mV) and right panels (20 µV). This difference in scale may have created the perception of baseline differences between the CCK and ACSF groups. To enhance clarity and avoid potential misunderstanding, we will unify the scale bar values in the revised figure. This adjustment will provide a clearer and more accurate comparison of fEPSPs between groups. Thank you again for bringing this issue to our attention.

      (5) From Figure S2D, it seems that different animals were injected with the drug and ACSF. Therefore, how the authors validate the position of the recording electrode to the cortical area of certain CF and relative EF. Also, there is not enough information about the basis of the selection of the EF. Should it be lower than the CF with a certain value? Was the EF determined after the initial tuning curve in each case? To mitigate this difference, it would be appropriate if the authors examined the presence of a significant difference in the tuning width and CFs between animals exposed to ACSF and CCK-4. This will give some validation of a balanced experiment between ACSF and CCK-4. I wonder also why the authors used rats here not mice, as it will be easier to interpret the results came from the same species.

      Thank you for your thoughtful comments. The effective frequency (EF) was determined after measuring the initial tuning curve for each case. The EF was selected to elicit a clear sound response while maintaining a sufficient distance from the characteristic frequency (CF) to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF. While there were individual differences in EF selection among animals, the methodology for determining EF was standardized and applied consistently across both the ACSF and CCK-4 groups.

      Regarding the use of rats in these experiments, these studies were conducted prior to our current work with mice. The findings in rat provide valuable insights that support our current results in mice. Since the rat data are supplementary to the primary findings, we included them as supplementary material to provide additional context and validation. Furthermore, in consideration of animal welfare, we chose not to replicate these experiments in mice, as the findings from rats were sufficient to support our conclusions.

      Methods section:

      “The tuning curve was determined by plotting the lowest intensity at which the neuron responded to different tones. The characteristic frequency (CF) is defined as the frequency corresponding to the lowest point on this curve. The effective frequency (EF) was determined to elicit a clear sound response while maintaining a sufficient distance from the CF to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF.”

      (6) Lines 384-386: There are no figures named 5H and I.

      Thank you for pointing this out. The references to Figures 5H and 5I were incorrect and should have referred to Figures 5C and 5D. We sincerely apologize for this oversight and will correct these errors in the revised manuscript to ensure clarity and accuracy. Thank you again for bringing this to our attention.

      (7) The authors should mention the sex of the animals used.

      Thank you for your comment and for highlighting this important detail. The sex of the animals used in this study is specified in the Animals section of the Methods: "In the present study, male mice and rats were used to investigate thalamocortical LTP." We appreciate your careful attention to this point and will ensure that this detail remains clearly stated in the manuscript.

      (8) Lines 534 and 648: These coordinates are difficult to understand. Since the experiment was done on both mice and rats, we need a clear description of the coordinates in both. Also, I think that you should mention the lateral distance from the sagittal suture as the ventral coordinates should be calculated from the surface of the skull above the AC and not from the sagittal suture.

      Thank you for your valuable feedback and for pointing out this important issue. We apologize for any confusion caused by our description of the coordinates. The term “ventral” was deliberately used because the auditory cortex is located on the lateral side of the skull, which may have caused some misunderstanding.

      To provide a clearer and more accurate descriptions of the coordinates, we will revise the text in the manuscript as follows: “A craniotomy was performed at the temporal bone (-2 to -4 mm posterior and -1.5 to -3 mm ventral to bregma for mice; -3.0 to -5.0 mm posterior and -2.5 to -6.5 mm ventral to bregma for rats) to access the auditory cortex.'

      We appreciate your attention to these details and will ensure that the revised manuscript includes this clarification to improve accuracy and eliminate potential confusion. Thank you again for bringing this to our attention.

      (9) Line 536: The author should specify that these coordinates are for the experiment done on mice.

      Thank you for your valuable feedback. We will revise the manuscript to explicitly specify that these coordinates refer to the experiments conducted on mice. This clarification will help improve the clarity and precision of the manuscript. We greatly appreciate your attention to this point and your effort to enhance the quality of our work.

      Methods section:

      “and a hole was drilled in the skull according to the coordinates of the ventral division of the MGB (MGv, AP: -3.2 mm, ML: 2.1 mm, DV: 3.0 mm) for experiments conducted on mice.”

      (10) Line 590: Please add the specifications of the stimulating electrode. Is it unipolar or bipolar? What is the cat.# provided by FHC?

      Thank you for your valuable feedback. The electrodes used in the experiments are unipolar. We will include the catalog number provided by FHC in the revised manuscript for clarity. The revised text will be updated as follows:

      “In HFS-induced thalamocortical LTP experiments, two customized microelectrode arrays with four tungsten unipolar electrodes each, impedance: 0.5-1.0 MΩ (recording: CAT.# UEWSFGSECNND, FHC, U.S.), and 200-500 kΩ (stimulating: CAT.# UEWSDGSEBNND, FHC, U.S.), were used for the auditory cortical neuronal activity recording and MGB ES, respectively.”

      We appreciate your attention to this detail, and we will ensure that the revised manuscript reflects this clarification accurately.

      (11) Lines 612-614: There are no details of how the optic fiber was inserted or post-examined. If there is a word limitation, the authors may reference another study showing these procedures.

      Thank you for your insightful comment and for highlighting this important aspect of the methodology. To address this, we will reference the study by Sun et al. (2024) in the revised manuscript, which provides detailed procedures for optic fiber insertion and post-examination. We believe that this reference will help enhance the clarity and completeness of the methods section.

      Sun, W., Wu, H., Peng, Y., Zheng, X., Li, J., Zeng, D., Tang, P., Zhao, M., Feng, H., Li, H., et al. (2024). Heterosynaptic plasticity of the visuo-auditory projection requires cholecystokinin released from entorhinal cortex afferents. eLife 13, e83356. 10.7554/eLife.83356.

      We appreciate your valuable suggestion, which will contribute to improving the quality of the manuscript.

      Minor concerns:

      (1) The definition of HFS was repeated many times throughout the manuscript. Please mention the defined name for the first time in the manuscript only followed by its abbreviation (HFS).

      Thank you for your suggestion and for pointing out this important detail. We will revise the manuscript to ensure that all abbreviations are defined only upon their first mention in the manuscript, with subsequent mentions using the abbreviations consistently. We appreciate your careful attention to detail and your effort to help improve the manuscript.

      (2) Line 173: There is a difference between here and the methods section (620 nm here and 635 nm there) please correct which wavelength the authors used.

      Thank you for your careful review and for bringing this discrepancy to our attention. We have corrected the inconsistency, and the wavelength has been unified throughout the manuscript to ensure accuracy and clarity. The revised text now reads as follows:

      “The fluorescent signal was monitored for 25s before and 60s after the HFLS (5~10 mW, 620 nm) or HFS application.”

      We appreciate your valuable feedback, which has helped us improve the precision and consistency of the manuscript.

      (3) Line 185: I think the authors should refer to Figure 2G before mentioning the statistical results.

      Thank you for your careful review and for pointing out this oversight. We have now added a reference to Figure 2G at the appropriate location to ensure clarity and logical flow in the manuscript, as recommended..

      (4) Line 202: I think the authors should refer to Figure 2J before mentioning the statistical results.

      Thank you again for your careful review and for highlighting this point. We have revised the manuscript to include a reference to Figure 2J before mentioning the statistical results.

      We appreciate your valuable feedback, which has helped us improve the accuracy and presentation of the results.

      (5) Line 260: Please add appropriate references at the end of the sentence to support the argument.

      Thank you for your valuable suggestion. To address this, we have add appropriate references to support the statement regarding the multiple steps involved between mRNA expression and neuropeptide release. Additionally, we have revised the statement to adopt a more cautious interpretation. The revised text is as follows:

      “It is widely recognized that mRNA levels do not always directly correlate with peptide levels due to multiple steps involved in peptide synthesis and processing, including translation, post-translational modifications, packaging, transportation, and proteolytic cleavage, all of which require various enzymes and regulatory mechanisms (38-41). A disruption at any stage in this process could lead to impaired CCK release, even when Cck mRNA is present.”

      We have included the following references to support this statement:

      38. Mierke, C.T. (2020). Translation and Post-translational Modifications in Protein Biosynthesis. In Cellular Mechanics and Biophysics: Structure and Function of Basic Cellular Components Regulating Cell Mechanics, C.T. Mierke, ed. (Springer International Publishing), pp. 595-665. 10.1007/978-3-030-58532-7_14.

      39. Gualillo, O., Lago, F., Casanueva, F.F., and Dieguez, C. (2006). One ancestor, several peptides post-translational modifications of preproghrelin generate several peptides with antithetical effects. Mol Cell Endocrinol 256, 1-8. 10.1016/j.mce.2006.05.007.

      40. Sossin, W.S., Fisher, J.M., and Scheller, R.H. (1989). Cellular and molecular biology of neuropeptide processing and packaging. Neuron 2, 1407-1417. https://doi.org/10.1016/0896-6273(89)90186-4.

      41. Hook, V., Funkelstein, L., Lu, D., Bark, S., Wegrzyn, J., and Hwang, S.R. (2008). Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 48, 393-423. 10.1146/annurev.pharmtox.48.113006.094812.

      We greatly appreciate your helpful feedback, which has allowed us to improve both the accuracy and the depth of discussion in the manuscript.

      (6) Line 278: The authors mentioned "due to the absence of CCK in aged animals", which was not an appropriate description. It should be a reduction of CCK gene expression or a possible deficient CCK release.

      Thank you for your careful review and for pointing out the inaccuracy in our description. We agree with your suggestion and have revised the statement to more appropriately reflect the findings.

      “Our findings revealed that thalamocortical LTP cannot be induced in aged mice, likely due to insufficient CCK release, despite intact CCKBR expression.”

      This revision ensures a more accurate and precise description of the potential mechanisms underlying the observed phenomenon. We greatly appreciate your valuable feedback, which has helped us improve the clarity and accuracy of the manuscript.

      (7) Line 291: The authors mentioned that "without MGB stimulation", which is confusing. The MGB was stimulated with a single electrical pulse to evoke cortical fEPSPs. Therefore it should be "without HFS of MGB".

      Thank you for pointing this out and for highlighting the potential confusion caused by our original phrasing. Upon review, we recognize that our original phrasing "without MGB stimulation" may have been unclear and could have led to misinterpretation. To clarify, our intention was to describe the period during which CCK was present without any stimulation of the MGB.

      It is important to note that, in the presence of CCK, LTP can be induced even with low-frequency stimulation, including in aged mice. This observation underscores the potent effect of CCK in facilitating thalamocortical LTP, regardless of the specific stimulation protocol used.

      To address this issue, we have revised the sentence for improved clarity as follows::

      " To investigate whether CCK alone is sufficient to induce thalamocortical LTP without activating thalamocortical projections, we infused CCK-4 into the ACx of young adult mice immediately after baseline fEPSPs recording. Stimulation was then paused for 15 min to allow for CCK degradation, after which recording resumed."

      We believe this revision resolves the misunderstanding and provides a clearer and more accurate description of the experimental context. We greatly appreciate your insightful feedback, which has helped us refine the manuscript for clarity and precision.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Line 99, 134, possibly other locations: "site" to "sites".

      Thank you for your careful review. We appreciate your attention to detail and have made the necessary corrections in the manuscript.

      (2) Throughout the manuscript there are some minor issues with language choice and subtle phrasing errors and I suggest English language editing.

      Thank you for your suggestion. In response, we have thoroughly reviewed the manuscript and addressed issues related to language choice and phrasing. The text has been carefully edited to ensure clarity, precision, and consistency. We believe these revisions have significantly enhanced the overall quality of the manuscript. We greatly appreciate your feedback, which has been invaluable in improving the presentation of our work.

      (3) Based on the experimental configurations, I do not think it is a problematic caveat, but authors should be aware of the high likelihood of AAV9 jumping synapses relative to other AAV serotypes.

      Thank you for bringing up the potential of AAV9 crossing synapses, a recognized characteristic of this serotype. We appreciate your observation regarding its relevance to our experimental design. In our study, we carefully considered the possibility of trans-synaptic transfer during both the experimental design and data interpretation phases. To minimize the likelihood of significant trans-synaptic spread, we implemented several measures, including controlling the injection volume, using a slow injection rate, and limiting the viral expression time. Post-hoc histological analyses confirmed that the expression of AAV9 was largely confined to the intended regions, with limited evidence of synaptic jumping under our experimental conditions.

      While we acknowledge the inherent potential for AAV9 to cross synapses, we believe this effect does not substantially confound the interpretation of our findings in the current study. To address this concern, we have added a brief discussion on this point in the revised manuscript to enhance clarity. We greatly appreciate your insightful comment, which has helped us further refine our work.

      Discussion section:

      “ One potential limitation of our study is the trans-synaptic transfer property of AAV9. To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      (4) The trace identifiers (1-4) do not seem correctly placed/colored in Figure S1D. Please check others carefully.

      Thank you for your careful review and for bringing this issue to our attention. We have corrected the trace identifiers in Figure S1D. Additionally, we have carefully reviewed all other figures to ensure their accuracy and consistency. We greatly appreciate your attention to detail, which has helped improve the overall quality of the manuscript.

      (5) Please provide a value of the laser power range based on calibrated values.

      Thank you for your suggestion. We have included the calibrated laser power range in the revised manuscript as follows:

      “The laser stimulation was produced by a laser generator (5-20 mW(30), Wavelength: 473 nm, 620 nm; CNI laser, China) controlled by an RX6 system and delivered to the brain via an optic fiber (Thorlabs, U.S.) connected to the generator.”

      We appreciate your feedback, which has helped improve the clarity and precision of our methodological description.

      (6) It would be useful to annotate figures in a way that identifies in which transgenic mice experiments are being performed.

      Thank you for your valuable suggestion. We will add annotations to the figures to explicitly identify the type of mice used in each experiment. We believe this enhancement will improve the clarity and accessibility of our results. We greatly appreciate your input in making our manuscript more informative.

      (7) Please comment on the rigor you use to address the accuracy of viral injections. How often did they spread outside of the MGB/AC?

      Thank you for raising this important question regarding the accuracy of viral injections and the potential spread outside the MGB or AC. Below, we provide details for each set of experiments:

      shRNA Experiments:

      For the shRNA experiments targeting the MGB, our primary goal was to achieve comprehensive coverage of the entire MGB. To this end, we used larger injection volumes and multiple injection sites, which inevitably resulted in some viral spread beyond the MGB. However, this approach was necessary to ensure robust knockdown effects that were representative of the entire MGB. While strict confinement to specific subregions could not be guaranteed, this strategy allowed us to prioritize the effectiveness of the knockdown within the target region.

      Fiber photometry Experiments:

      For the fiber photometry experiments targeting the auditory cortex (AC), we used larger injection volumes and multiple injection sites to cover its relatively large size. Although this approach might have resulted in some CCK-sensor virus spread outside the AC, the placement of the optic fiber was guided by the location of the auditory cortex. Consequently, any minor viral expression outside the AC would not affect the experimental results, as recordings were confined to the intended area through precise fiber placement.  

      Optogenetic Experiments:

      For the optogenetic experiments targeting the MGB, we specifically injected virus into the MGv subregion. To minimize viral spread, we employed several strategies, including the used fine injection needles, waiting for tissue stabilization (7 minutes post-needle insertion), delivering small volumes at a slow rate to prevent backflow, aspirating 5 nL of the solution post-injection, and raising the needle by 100 μm before waiting an additional 5 minutes prior to full retraction. These measures significantly reduced the risk of viral leakage to adjacent regions.

      Histological Validation:

      After the electrophysiological experiments, we systematically verified the accuracy of viral expression by examining histological sections to ensure that the expression was primarily localized within the intended regions.

      Terminology in the Manuscript:

      In the manuscript, we deliberately used the term "MGB" in the manuscript rather than specifically "MGv" to transparently acknowledge the potential for viral spread in some experiments.

      We hope this explanation clarifies the strategies we employed to address the accuracy of viral injections, as well as how we managed potential viral spread. We have also added a brief information in the revised manuscript to reflect these points and acknowledge the inherent variability in viral delivery.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their thoughtful and detailed feedback, which we found highly constructive and encouraging. The comments have been invaluable in guiding improvements to the clarity, rigor, and impact of our manuscript. Below, we provide our responses and outline the specific revisions we plan to make in response to each point raised. It was extremely encouraging that all the comments were highly relevant to the study demonstrating careful work by experts in the field and they truly help to improve the clarity and message of the manuscript.

      2. Description of the planned revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Gizaw et al characterizes the cholesterol biosynthetic pathway and the effect of its knockdown or inhibition on rhabdomyosarcoma tumor properties. The Authors find that the PROX1 transcription factor mediated cholesterol biosynthesis regulates rhabdomyosarcoma cell growth and proliferation. Blocking the cholesterol biosynthetic pathway leads to reduced proliferation, cell cycle arrest and ER-stress mediated enhanced apoptosis. Detailed transcriptomic analysis indicate gene expression patterns that support these findings. Reviewer #1 (Significance (Required)):

      Based on my expertise on rhabdomyosarcoma tumors, the manuscript is clear, concise and provides a significant advance to the field. Detailed mechanistic characterization is lacking, which takes away some of the significance of the findings, but the work done stands alone as description of the effect of the cholesterol biosynthetic pathway in rhabdomyosarcoma. Another aspect to be considered by the Authors is the potential specificity of targeting a ubiquitous pathway such as cholesterol biosynthesis, which is important to most cells and not only cancer cells. Overall, the manuscript may be revised to address the specific comments below.

      Responses to Reviewer #1 comments

      We thank the reviewer for the thoughtful and encouraging comments on our manuscript. We appreciate the recognition of the significance of our findings and the detailed suggestions provided. We are committed to addressing each of the reviewer's points to strengthen the manuscript and ensure clarity and rigor. Below, we outline how we plan to address each comment.

      Major Comments:

      1. __ Details of the healthy human myoblasts that are used in Figure 1A are not provided and should be updated. Evidence of PROX1 knockdown should be presented. What kind of pathways and gene ontology predictions were associated with the 225 genes that are commonly downregulated between all three cell lines in Figure 1A?__

      Response: In the revised manuscript, we will include complete information regarding the origin and characterization of the healthy human myoblasts used in the Figure 1A. We will also provide additional data confirming PROX1 knockdown. Furthermore, we will present more details on the gene ontology (GO) and pathway enrichment analyses, and include the full results as supplemental data to highlight key biological processes affected by PROX1 silencing.

      __ In Figure 2, while the effect of the shRNAs targeting DHCR7 or the DHCR7 inhibitor AY9944 are striking, it is not clear whether these effects are specific to rhabdomyosarcoma cells or cancer cells. A control, human myoblast cell line or another non-cancerous cell line should be used to repeat these experiments quantifying Caspase3/7 activity, cell growth etc. to assess the cancer cell specificity of such treatments. Evidence of DHCR7 knockdown at the protein level would add to the study.__


      Response: We fully agree with the reviewer's suggestion and will conduct additional experiments using non-cancerous human myoblasts to assess the specificity of DHCR7 inhibition. These will include assays for Caspase 3/7 activation, cell viability, and proliferation under similar conditions. We have already performed western blot validation of DHCR7 knockdown at the protein level in RMS cell lines and will include this data in the manuscript. We will also highlight in the discussion that RMS cells in our experiments were highly vulnerable when cultured with full media (incl. FBS), whereas previous studies with breast cancer cells have shown that their growth is affected by cholesterol biosynthesis inhibition only if they are cultured without serum (containing cholesterol). We also show that cholesterol supplementation does not rescue RMS cells demonstrating the essential role of de novo cholesterol synthesis.

      __ Western blots for Caspase3 quantification and a cell proliferation marker such as Cyclin D in shSCR and shDHCR7 tumor lysates would validate the data shown in the Figure 3. Are the shRNA constructs used inducible ones? If not, how do the Authors distinguish the effect of shDHCR7 on tumor engraftment versus tumor proliferation and growth? Many of the graphs need proper labeling of the axes and what the bars represent.__


      Response: We will include western blot analysis for cleaved Caspase 3 and Cyclin D1 in tumor lysates to support the observed effects on apoptosis and proliferation. We will clarify in the revised manuscript that the shRNA constructs used were constitutive. To distinguish between effects on tumor engraftment versus tumor growth, we will provide additional detail on how we controlled for initial cell viability and engraftment potential prior to injection. We will also revise figure panels to ensure all axes and error bars are clearly labeled.

      __ Gene ontology and pathway analysis will add to Figure 4.__


      Response: We will expand Figure 4 to include GO and pathway enrichment analyses of the RNA-seq data following DHCR7 knockdown. This will help illustrate the functional significance of the transcriptional changes and further support our conclusions regarding ER stress, apoptosis, and cell cycle regulation.

      __ In Figure 5A, how do the Authors explain the upregulation of cholesterol biosynthetic pathway genes upon shDHCR7 treatment? Are these effects seen at the protein level and if alternate pathways maintain cholesterol biosynthesis, how do the Authors think this strategy will be viable to treat such tumors? In Figure 5G-H, was a loading control used? If so, blots for that should be included.__


      Response: We will expand the discussion to address the compensatory transcriptional upregulation of cholesterol biosynthesis genes following DHCR7 knockdown, likely driven by SREBP-mediated feedback regulation. To support this, we will include western blot data for key enzymes in the pathway. We will also clarify that despite this transcriptional compensation, functional cholesterol synthesis is impaired due to DHCR7 silencing, which cannot be rescued by increased upstream pathway activity. Regarding Figure 5G-H, we will include the missing loading control images in the revised version. Protein normalization was performed using Stain-Free technology, which enables the quantification of total protein in each lane, and was analyzed using ImageLab 6.0.1 software (Bio-Rad). We will include the Stain-Free gel images to demonstrate equal protein loading and will also indicate the molecular weights of the presented proteins in the updated figure legend.

      __ Lines 286-287 refer to Figure S1G, H; it should be corrected to Figure S1I, J.__

      Response: We thank the reviewer for pointing this out. We will correct the figure citation in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript entitled "Targeting de novo cholesterol synthesis in rhabdomyosarcoma induces cell cycle arrest and triggers apoptosis through ER stress-mediated pathways" Gizaw et al investigate the crucial effect of targeting cholesterol biosynthesis in RMS. While this manuscript gives novel insights into putative therapeutic approach, there are some comments that should be address by the authors.

      Reviewer #2 (Significance (Required)):

      A nice and coherent study. Please see text above.


      Response to Reviewer #2

      We are grateful to the reviewer for the thoughtful and constructive comments on our manuscript. We appreciate your recognition of the novelty and therapeutic potential of our findings, and we thank you for highlighting specific areas that will help further improve the clarity, rigor, and reproducibility of our work. Below, we respond point-by-point to your comments and outline how we plan to address each issue in the revised version of the manuscript.

      Major Comments:

      1. __ The authors demonstrated a correlation between PROX1 levels and the cholesterol synthesis pathway. Which genes from the pathway are mostly affected? The manuscript could benefit from a graphical representation of the pathway showing up- and downregulated genes from the RNA-seq analysis. This will help in understanding why the authors decided to study HMGCR silencing as shown in Supplementary Figure 1A.__

      Response: We fully agree and will include a new graphical figure showing the cholesterol biosynthesis pathway, with up- and downregulated genes from our RNA-seq data visually mapped. This is, indeed, interesting as the whole pathway is consistently downregulated. We chose to study specifically these two rate-limiting genes in the pathway, as DHCR7 is the last enzyme in the mevalonate pathway and its inhibition does not affect other arms deviating from this pathway. It was also recently found to be highly upregulated in pancreatic cancer, suggesting its role in cancer development/growth. HMGCR was chosen as it is the target for statins, which are widely used in treating high cholesterol and shown to be rather safe in clinical use. We will add this rationale to the manuscript to clarify our focus on HMGCR and DHCR7.

      __ Based on the previous comment, are the genes from the cholesterol synthesis identified in the RNA-seq similar to those detected in the publicly available data set presented in Figure 1E? In addition, validation of changes of these genes should be performed in the RMS cell lines as well as in myoblasts.__


      Response: Yes, there is a significant overlap between the cholesterol biosynthesis genes identified in our RNA-seq dataset and those from the public dataset in Figure 1E. In the revised version, we will include this comparative analysis with the inclusion of the schematic figure (see our response #1). We also plan to perform qPCR validation of several key cholesterol biosynthesis genes in additional RMS cell lines and healthy myoblasts to reinforce the disease-specific regulation of this pathway.

      __ In Figure 3, the authors study the impact of DHCR7-silencing in tumor growth in vivo. Please, provide stainings also for DHCR7 to show that cells indeed have silenced DHCR7.__


      Response: Thank you for this important suggestion. We will include immunofluorescence staining for DHCR7 in xenograft tumor sections to confirm DHCR7 knockdown in vivo and visually validate the efficiency of our silencing strategy. We will also add qPCR results from the cells at the time when they were implanted confirming the deletion.

      __ In Figure 4, the RNA-seq data revealed downregulation in E2F genes as well as genes involved in cell cycle progression. It would be important that the authors provide examples of these genes and validate this data by performing qPCR.__


      Response: We will select representative cell cycle-related genes, including members of the E2F family and other G1/S and G2/M regulators, for qPCR validation in RMS cells following DHCR7 knockdown. Comparison to healthy myoblasts will be also performed. This will further substantiate the transcriptomic findings.

      __ In Figure 4J-M, cell cycle distribution using flow cytometry should be assessed in an additional cell line.__


      Response: We will repeat the flow cytometry-based cell cycle analysis in an additional RMS cell line to ensure reproducibility and confirm the generalizability of the observed G2/M arrest phenotype.

      __ In line 271, the authors described that PROX1 is associated with an increase in DHCR7. However, in the next paragraph they evaluated the effect of silencing HMGCR. Is this enzyme also increased? Please clarify.__


      Response: We appreciate the need for clarity. HMGCR expression is also elevated in RMS cells and regulated by PROX1. We will clarify this in the revised manuscript and update the text to explain the rationale behind examining both enzymes: HMGCR as the rate-limiting enzyme at the top of the cholesterol biosynthesis pathway, and DHCR7 as the final step enzyme. See also our response to question #1.

      __ The authors show that cholesterol biosynthesis is crucial in RMS. Would overexpression of DHCR7 in shDHCR7 cells rescue the anti-tumor effects? A rescue experiment would give information on whether this enzyme has a direct role in driving RMS cell behavior.__


      Response: This is an excellent suggestion. We are currently generating a DHCR7 rescue construct and plan to perform these experiments. While these data may not be available in time for the current revision, we will clearly outline this approach as a key next step in our Discussion section and incorporate results if available.

      Minor Comments:

      1. __ In line 287 "Supplementary Fig.1G and 1H" are mentioned, while it should be "Supplementary Fig.1I and 1J" since it regards the treatment with lovastatin.__

      Response: Thank you for catching this. We will correct the figure references accordingly.

      __ In line 340, authors mentioned the data "Supplementary Figure 4A and 4E", but there is not any corresponding data available in the Supplementary Information.__


      Response: We apologize for this oversight. These references will be corrected, and any missing supplementary data will be properly included and labeled.

      __ In the Legend of Figure 2L, authors mention "PRXO-1 silencing", this should be corrected to "shDHCR7". Also, please change "l" to capital "L".__


      Response: This will be corrected in the revised figure legend.

      __ In Figure 5G-H, please provide the data regarding loading control in the Western blot, as well as the molecular weights of the proteins presented.__


      Response: We thank the reviewer for this important point. For the Western blot analysis in Figure 5G-H, normalization was performed by quantifying the total protein in each lane using Bio-Rad's Stain-Free technology and analyzed with ImageLab 6.0.1 software. This approach allows for accurate lane-to-lane comparison without relying on a single housekeeping protein. We will add the Stain-Free total protein images as a supplemental figure (Supplementary Figure) and include the molecular weights for each of the proteins in the figure legend to improve clarity and reproducibility.

      __ Please, include the information of what black, red etc refer to in each figure. This information is missing in several figures including Figure 2D, 2K, 3C, 3J, 3K, 3L which makes it difficult to follow.__


      Response: We agree and will update all relevant figure legends to clearly explain color coding, symbols, and what each bar or line represents to improve figure clarity.

      __ The authors should indicate the numbers of biological replicates in individual experiments throughout whole figure legends.__


      Response: Thank you for the suggestion. We will include the number of biological replicates for each experiment in the figure legends to enhance transparency and reproducibility.


    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates how hearing impairment affects neural encoding of speech, in particular the encoding of hierarchical linguistic information. The current analysis provides incomplete evidence that hearing impairment affects speech processing at multiple levels, since the novel analysis based on HM-LSTM needs further justification. The advantage of this method should also be further explained. The study can also benefit from building a stronger link between neural and behavioral data.

      We sincerely thank the editors and reviewers for their detailed and constructive feedback.

      We have revised the manuscript to address all of the reviewers’ comments and suggestions. The primary strength of our methods lies in the use of the HM-LSTM model, which simultaneously captures linguistic information at multiple levels, ranging from phonemes to sentences. As such, this model can be applied to other questions regarding hierarchical linguistic processing. We acknowledge that our current behavioral results from the intelligibility test may not fully differentiate between the perception of lower-level acoustic/phonetic information and higher-level meaning comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. We aim to xplore this connection further in future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards. I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials. 

      It is my understanding that keeping the repository private during the review process and making them public after acceptance is standard practice. As far as I understand, although the OSF repository was private, anyone with the link should be able to access it. I have now made the repository public.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      Model fit was measured by spatiotemporal cluster permutation tests (Maris & Oostenveld, 2007) on the contrasts of the timecourses of the z-transformed coefficient of determination (R<sup>2</sup>). For instance, to assess whether words from the attended stimuli better predict EEG signals during the mixed speech compared to words from the unattended stimuli, we used the 150dimensional vectors corresponding to the word layer from our LSTM model for the attended and unattended stimuli as regressors. We then fit these regressors to the EEG signals at 9 time points (spanning -100 ms to 300 ms around the sentence offsets, with 50 ms intervals). We then conducted one-tailed two-sample t-tests to determine whether the differences in the contrasts of the R<sup>2</sup> timecourses were statistically significant. Note that we did not perform TRF analyses. We have clarified this description in the “Spatiotemporal clustering analysis” section of the “Methods and Materials” on p.10 of the manuscript.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? 

      The original HM-LSTM model developed by Chung et al. (2017) consists of only two levels: the word level and the phrase level (Figure 1b from their paper). By “extending” the model, we mean that we expanded its architecture to include five levels: phoneme, syllable, word, phrase, and sentence. Since our input consists of phoneme embeddings, we cannot directly apply their model, so we trained our model on the WenetSpeech corpus (Zhang et al., 2021), which provides phoneme-level transcripts. We have added this clarification on p.4 of the manuscript.

      • And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? 

      Yes, we extracted the 2048-dimensional hidden layer activity from the model to represent features for each sentence in our speech stimuli at the phoneme, syllable, word, phrase and sentence levels. But we did not perform any TRF deconvolution, we fit these features (downsampled to 150-dimension using PCA) to the EEG signals at 9 timepoints around the offset of each sentence using ridge regression. We have now added a multivariate TRF (mTRF) analysis following Reviewer 3’s suggestions, and the results showed similar patterns to the current results (see Figure S2). We have added the clarification in the “Ridge regression at different time latencies” section of the “Methods and Materials” on p.10 of the manuscript.

      Resutls from the mTRF analyses were added on p.7 of the manuscript.

      • A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.

      The linguistic regressors are just 5 150-dimensional vectors, each corresponding to one linguistic level, as shown in Figure 1B.

      • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.

      No, these regressors were not like that. They were 150-dimensional vectors (after PCA dimension reduction) extracted from the hidden layers of the HM-LSTM model. After training the model on the WenetSpeech corpus, we ran it on our speech stimuli and extracted representations from the five hidden layers to correspond to the five linguistic levels. As mentioned earlier, we did not perform TRF analyses; instead, we used ridge regression to predict EEG signals around the offset of each sentence, a method commonly employed in the literature (e.g., Caucheteux & King, 2022; Goldstein et al., 2022; Schmitt et al., 2021; Schrimpf et al., 2021). For instance, Goldstein et al. (2022) used word embeddings from GPT-2 to predict ECoG activity surrounding the onset of each word during naturalistic listening. We have included these literatures on p.3 in the manuscript, and the method is illustrated in Figure 1B.

      • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?

      All the regressors are represented as 2048-dimensional vectors derived from the hidden layers of the trained HM-LSTM model. We applied the trained model to all 284 sentences in our stimulus text, generating a set of 284 × 2048-dimensional vectors. Next, we performed Principal Component Analysis (PCA) on the 2048 dimensions and extracted the first 100 principal components (PCs), resulting in 284 × 100-dimensional vectors for each regressor. These 284 × 100 matrices were then flattened into 28,400-dimensional vectors. Subsequently, we computed the correlation matrix for the z-transformed 28,400-dimensional vectors of our five linguistic regressors. The code for this analysis, lstm_corr.py, can be found in our OSF repository. We have added a section “Correlation among linguistic features” in “Materials and Methods” on p.10 of the manuscript.

      We consider the observed coefficients of 0.17 and 0.22 to be relatively low compared to prior model-brain alignment studies which report correlation coefficients above 0.5 for linguistic regressors (e.g., Gao et al., 2024; Sugimoto et al., 2024). In Chinese, a single syllable can also function as a word, potentially leading to higher correlations between regressors for syllables and words. However, we refrained from overinterpreting the results to suggest a higher correlation between syllable and sentence compared to syllable and word. A paired ttest of the syllable-word coefficients versus syllable-sentence coefficients across the 284 sentences revealed no significant difference (t(28399)=-3.96, p=1). We have incorporated this information into p.5 of the manuscript.

      • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      All the regressors are aligned to 9 time points surrounding sentence offsets (-100 ms to 300 ms with a 50 ms interval). This is because all our regressors are taken from the HM-LSTM model, where the input is the phoneme representation of a sentence (e.g., “zh ə_4 y ie_3 j iəu_4 x iaŋ_4 sh uei_3 y ii_2 y aŋ_4”). For each unit in the sentence, the model generates five 2048dimensional vectors, each corresponding to the five linguistic levels of the entire sentence. We have added the clarification on p.11 of the manuscript.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?

      • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.

      • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.

      Thank you very much for pointing this out. All instances of “sentence onset” were typos and should be corrected to “sentence offset.” We chose offset because the regressors are derived from the hidden layer activity of our HM-LSTM model, which processes the entire sentence before generating outputs. We have now corrected all the typos. In continuous speech, there is no distinct silence period following sentence offsets. Additionally, lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Therefore, we included a 300 ms interval after sentence offsets in our analysis, as our regressors encompass linguistic levels up to the sentence level. We have added this motivation on p.11 of the manuscript.

      • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.

      Completely agree and thank you very much for the suggestion. We have now added this information on Figure 4-6.

      • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?

      As mentioned earlier, we did not perform TRF analyses or convolve the regressors. Instead, we conducted regression analyses at each of the 9 time points surrounding the sentence offsets, following standard methods commonly used in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022). The time window of -100 to 300 ms was selected based on prior findings that lexical and phrasal processing typically occurs 200–300 ms after word offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (cf. Gwilliams et al., 2022). We have added the clarification on p. of the manuscript.

      • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.

      The rationale for choosing sentence offsets instead of onsets is that we are aligning the HM-LSTM model’s activity with EEG responses, and the input to the model consists of phoneme representations of the entire sentence at one time. In other words, the model needs to process the whole sentence before generating representations at each linguistic level. Therefore, the corresponding EEG responses should also align with the sentence offsets, occurring after participants have seen the complete sentence. The ridge regression followed the common practice in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021), and the time window is not cherrypicked but based on prior literature reporting lexical and sublexical processing at these time period (e.g., Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Gwilliams et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021).

      • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      The intersecting lines in Figures 5 and represent the significant time windows for withingroup comparisons (i.e., significant model fit compared to 0). They do not depict betweengroup comparisons, as no significant contrasts were found between the groups. For example, in Figure 1, the significant time windows for the acoustic models are shown separately for the hearing-impaired and normal-hearing groups. No significant differences were observed, as indicated by the sensor topography. We have now clarified this point in the captions for Figures 5 and 6.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?

      The ridge regression was performed using customary python codes, making heavy use of the sklearn (v1.12.0) package. We used ridge regression instead of ordinary least squares regression because all our linguistic regressors are 150-dimensional dense vectors, and our acoustic regressors are 130-dimension vectors (see “Acoustic features of the speech stimuli” in “Materials and Methods”). We kept the default regularization parameter (i.e., 1). This ridge regression methods is commonly used in model-brain alignment studies, where the regressors are high-dimensional vectors taken from language models (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). The code ridge_lstm.py can be found in our OSF repository, and we have added the more detailed description on p.11 of the manuscript.

      • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.

      • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

      Yes we reduced 2048-dimensional vectors for each of the 5 linguistic levels to 150 using PCA, mainly for saving computational resources. We used ridge regression, following the standard practice in the field (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). 

      Yes, the regression outcomes are the R<sup>2</sup> values representing the fit between the predicted and actual EEG data. However, we reported normalized R<sup>2</sup> values which are ztransformed in the plots. All our spatiotemporal cluster permutation analyses were conducted using the z-transformed R<sup>2</sup> values. We have added this clarification both in the figure captions and on p.11 of the manuscript. As a side note, R<sup>2</sup> values can be negative because they are not the square of a correlation coefficient. Rather, R<sup>2</sup> compares the fit of the chosen model to that of a horizontal straight line (the null hypothesis). If the chosen model fits the data worse than the horizontal line, then R<sup>2</sup> value becomes negative: https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative 

      Reviewer #2 (Public Review):

      This study compares neural responses to speech in normal-hearing and hearing-impaired listeners, investigating how different levels of the linguistic hierarchy are impacted across the two cohorts, both in a single-talker and multi-talker listening scenario. It finds that, while normal-hearing listeners have a comparable cortical encoding of speech-in-quiet and attended speech from a multi-talker mixture, participants with hearing impairment instead show a reduced cortical encoding of speech when it is presented in a competing listening scenario. When looking across the different levels of the speech processing hierarchy in the multi-talker condition, normal-hearing participants show a greater cortical encoding of the attended compared to the unattended stream in all speech processing layers - from acoustics to sentencelevel information. Hearing-impaired listeners, on the other hand, only have increased cortical responses to the attended stream for the word and phrase levels, while all other levels do not differ between attended and unattended streams.

      The methods for modelling the hierarchy of speech features (HM-LSTM) and the relationship between brain responses and specific speech features (ridge-regression) are appropriate for the research question, with some caveats on the experimental procedure. This work offers an interesting insight into the neural encoding of multi-talker speech in listeners with hearing impairment, and it represents a useful contribution towards understanding speech perception in cocktail-party scenarios across different hearing abilities. While the conclusions are overall supported by the data, there are limitations and certain aspects that require further clarification.

      (1) In the multi-talker section of the experiment, participants were instructed to selectively attend to the male or the female talker, and to rate the intelligibility, but they did not have to perform any behavioural task (e.g., comprehension questions, word detection or repetition), which could have demonstrated at least an attempt to comply with the task instructions. As such, it is difficult to determine whether the lack of increased cortical encoding of Attended vs. Unattended speech across many speech features in hearing-impaired listeners is due to a different attentional strategy, which might be more oriented at "getting the gist" of the story (as the increased tracking of only word and phrase levels might suggest), or instead it is due to hearing-impaired listeners completely disengaging from the task and tuning back in for selected key-words or word combinations. Especially the lack of Attended vs. Unattended cortical benefit at the level of acoustics is puzzling and might indicate difficulties in performing the task. I think this caveat is important and should be highlighted in the Discussion section. RE: Thank you very much for the suggestion. We admit that the hearing-impaired listeners might adopt different attentional strategies or potentially disengage from the task due to comprehension difficulties. However, we would like to emphasize that our hearing-impaired participants have extended high-frequency (EHF) hearing loss, with impairment only at frequencies above 8 kHz. Their condition is likely not severe enough to cause them to adopt a markedly different attentional strategy for this task. Moreover, it is possible that our normalhearing listeners may also adopt varying attentional strategies, yet the comparison still revealed notable differences.We have added the caveat in the Discussion section on p.8 of the manuscript.

      (2) In the EEG recording and preprocessing section, you state that the EEG was filtered between 0.1Hz and 45Hz. Why did you choose this very broadband frequency range? In the literature, speech responses are robustly identified between 0.5Hz/1Hz and 8Hz. Would these results emerge using a narrower and lower frequency band? Considering the goal of your study, it might also be interesting to run your analysis pipeline on conventional frequency bands, such as Delta and Theta, since you are looking into the processing of information at different temporal scales.

      Indeed, we have decomposed the epoched EEG time series for each section into six classic frequency bands components (delta 1–3 Hz, theta 4–7 Hz, alpha 8–12 Hz, beta 12–20 Hz, gamma 30–45 Hz) by convolving the data with complex Morlet wavelets as implemented in MNE-Python (version 0.24.0). The number of cycles in the Morlet wavelets was set to frequency/4 for each frequency bin. The power values for each time point and frequency bin were obtained by taking the square root of the resulting time-frequency coefficients. These power values were normalized to reflect relative changes (expressed in dB) with respect to the 500 ms pre-stimulus baseline. This yielded a power value for each time point and frequency bin for each section. We specifically examined the delta and theta bands, and computed the correlation between the regression outcome (R<sup>2</sup> in the shape of number of subject * sensor * time were flattened for computing correlation) for the five linguistic predictors from these bands and those obtained using data from all frequency bands. The results showed high correlation coefficients (see the correlation matrix in Supplementary Figures S2 for the attended and unattended speech). Therefore, we opted to use the epoched EEG data from all frequency bands for our analyses. We have added this clarification in the Results section on p.5 and the “EEG recording and preprocessing” section in “Materials and Methods” on p.11 of the manuscript.

      (3) A paragraph with more information on the HM-LSTM would be useful to understand the model used without relying on the Chung et al. (2017) paper. In particular, I think the updating mechanism of the model should be clarified. It would also be interesting to modify the updating factor of the model, along the lines of Schmitt et al. (2021), to assess whether a HM-LSTM with faster or slower updates can better describe the neural activity of hearing-impaired listeners. That is, perhaps the difference between hearing-impaired and normal-hearing participants lies in the temporal dynamics, and not necessarily in a completely different attentional strategy (or disengagement from the stimuli, as I mentioned above).

      Thank you for the suggestion. We have added more details on our HM-LSTM model on p.10 “Hierarchical multiscale LSTM model” in “Materials and Methods”: Our HM-LSTM model consists of 4 layers, at each layer, the model implements a COPY or UPDATE operation at each time step t. The COPY operation maintains the current cell state of without any changes until it receives a summarized input from the lower layer. The UPDATE operation occurs when a linguistic boundary is detected in the layer below, but no boundary was detected at the previous time step t-1. In this case, the cell updates its summary representation, similar to standard RNNs. We agree that exploring modifications to the model’s updating factor would be an interesting direction. However, since we have already observed contrasts between normal-hearing and hearing-impaired listeners using the current model’s update parameters, we believe discussing additional hypotheses would overextend the scope of this paper.

      (4) When explaining how you extracted phoneme information, you mention that "the inputs to the model were the vector representations of the phonemes". It is not clear to me whether you extracted specific phonetic features (e.g., "p" sound vs. "b" sound), or simply the phoneme onsets. Could you clarify this point in the text, please?

      The model inputs were individual phonemes from two sentences, each transformed into a 1024-dimensional vector using a simple lookup table. This lookup table stores embeddings for a fixed dictionary of all unique phonemes in Chinese. This approach is a foundational technique in many advanced NLP models, enabling the representation of discrete input symbols in a continuous vector space. We have added this clarification on p.10 of the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. To address this limitation, the authors should consider evaluating alternative models and methods. For example, directly using spectrograms, discrete phoneme/syllable/word coding as features, and performing feature-based temporal response function (TRF) analysis could serve as valuable baseline models. This approach would provide a more comprehensive evaluation of the neural encoding of linguistic information.

      Our acoustic features are indeed direct the broadband envelopes and the log-mel spectrograms of the speech streams. The amplitude envelope of the speech signal was extracted using the Hilbert transform. The 129-dimension spectrogram and 1-dimension envelope were concatenated to form a 130-dimension acoustic feature at every 10 ms of the speech stimuli. Given the duration of our EEG recordings, which span over 10 minutes, conducting multivariate TRF (mTRF) analysis with such high-dimensional predictors was not feasible. Instead, we used ridge regression to predict EEG responses across 9 temporal latencies, ranging from -100 ms to +300 ms, with additional 50 ms latencies surrounding sentence offsets. To evaluate the model's performance, we extracted the R<sup>2</sup> values at each latency, providing a temporal profile of regression performance over the analyzed time period. This approach is conceptually similar to TRF analysis.

      We agree that including baseline models for the linguistic features is important, and we have now added results from mTRF analysis using phoneme, syllable, word, phrase, and sentence rates as discrete predictors (i.e., marking a value of 1 at each unit boundary offset). Our EEG data spans the entire 10-minute duration for each condition, sampled at 10-ms intervals. The TRF results for our main comparison—attended versus unattended conditions— showed similar patterns to those observed using features from our HM-LSTM model. At the phoneme and syllable levels, normal-hearing listeners showed marginally significantly higher TRF weights for attended speech compared to unattended speech at approximately -80 to 150 ms after phoneme offsets (t=2.75, Cohen’s d=0.87, p=0.057), and 120 to 210 ms after syllable offsets (t=3.96, Cohen’s d=0.73d = 0.73, p=0.083). At the word and phrase levels, normalhearing listeners exhibited significantly higher TRF weights for attended speech compared to unattended speech at 190 to 290 ms after word offsets (t=4, Cohen’s d=1.13, p=0.049), and around 120 to 290 ms after phrase offsets (t=5.27, Cohen’s d=1.09, p=0.045). For hearing-impaired listeners, marginally significant effects were observed at 190 to 290 ms after word offsets (t=1.54, Cohen’s d=0.6, p=0.059), and 180 to 290 ms after phrase offsets (t=3.63, Cohen’s d=0.89, p=0.09). These results have been added on p.7 of the manuscript, and the corresponding figure is included as Supplementary F2.

      It is not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. Specifically, the results presented in Figure 3C are somewhat confusing. While the phonemes are labeled, the syllables, words, phrases, and sentences are not, making it difficult to interpret how the model distinguishes between these levels of linguistic information. The claim that "Hidden-layer activity for samevowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels" is not convincingly supported by the provided visualizations. To strengthen their argument, the authors should use more quantified metrics to demonstrate that the model indeed captures phrase, word, syllable, and phoneme information at different layers. This is a crucial prerequisite for the subsequent analyses and claims about the hierarchical processing of linguistic information in the brain.

      Quantitative measures such as mutual information, clustering metrics, or decoding accuracy for each linguistic level could provide clearer evidence of the model's effectiveness in this regard.

      In Figure 3C, we used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1). The results demonstrate that the phoneme layer effectively distinguishes different phonemes, while the higher linguistic layers do not. We believe these findings provide evidence that different layers capture distinct linguistic information. Additionally, we computed the correlation coefficients between each pair of linguistic predictors, as shown in Figure 3B. We think this analysis serves a similar purpose to computing the mutual information between pairs of hidden-layer activities for our constructed sentences. Furthermore, the mTRF results based on rate models of the linguistic features we presented earlier align closely with the regression results using the hidden-layer activity from our HM-LSTM model. This further supports the conclusion that our model successfully captures relevant information across these linguistic levels. We have added the clarification on p.5 of the manuscript.

      The formulation of the regression analysis is somewhat unclear. The choice of sentence offsets as the anchor point for the temporal analysis, and the focus on the [-100ms, +300ms] interval, needs further justification. Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time. Additionally, explaining the rationale behind choosing this specific time window and how it aligns with the temporal dynamics of speech processing would enhance the clarity and validity of the regression analysis.

      Thank you for pointing this out. We chose this time window as lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (e.g., Gwilliams et al., 2022). Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. We have added this clarification on p.12 of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As I mentioned, I think the OSF repo needs to be changed to give anyone access. I would recommend pursuing the lines of thought I mentioned in the public review to make this study complete and to allow it to fit into the already existing literature to facilitate comparisons.

      Yes the OSF folder is now public. We have made revisions following all reviewers’ suggestions.

      There are some typos in figure labels, e.g. 2B.

      Thank you for pointing it out! We have now revised the typo in Figure 2B.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was able to access all of the audio files and code for the study, but no EEG data was shared in the OSF repository. Unless there is some ethical and/or legal constraint, my understanding of eLife's policy is that the neural data should be made publicly available as well.

      The preprocessed EEG data in .npy format in the OSF repository. 

      (2) The line-plots in Figures 4B,5B, and 6B have very similar colours. They would be easier to interpret if you changed the line appearance as well as the colours. E.g., dotted line for hearingimpaired listeners, thick line for normal-hearing.

      Thank you for the suggestion! We have now used thicker lines for normal-impaired listeners in all our line plots.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors may consider presenting raw event-related potentials (ERPs) or spatiotemporal response profiles before delving into the more complex regression encoding analysis. This would provide a clearer foundational understanding of the neural activity patterns. For example, it is not clear if the main claims, such as the neural activity in the normal-hearing group encoding phonetic information in attended speech better than in unattended speech, are directly observable. Showing ERP differences or spatiotemporal response pattern differences could support these claims more straightforwardly. Additionally, training pattern classifiers to test if different levels of information can be decoded from EEG activity in specific groups could provide further validation of the findings.

      We have now included results from more traditional mTRF analyses using phoneme, syllable, word, phrase, and sentence rates as baseline models (see p.7 of the manuscript and Figure S3). The results show similar patterns to those observed in our current analyses. While we agree that classification analyses would be very interesting, our regression analyses have already demonstrated distinct EEG patterns for each linguistic level. Consequently, classification analyses would likely yield similar results unless a different method for representing linguistic information at these levels is employed. To the best of our knowledge, no other computational model currently exists that can simultaneously represent these linguistic levels.

      (2) Is there any behavioral metric suggesting that these hearing-impaired participants do have deficits in comprehending long sentences? The self-rated intelligibility is useful, but cannot fully distinguish between perceiving lower-level phonetic information vs longer sentence comprehension.

      In the current study, we included only self-rated intelligibility tests. We acknowledge that this approach might not fully distinguish between the perception of lower-level phonetic information and higher-level sentence comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. Furthermore, our primary aim was to use the behavioral results to demonstrate that our hearing-impaired listeners experienced speech comprehension difficulties in multi-talker environments, while relying on the EEG data to investigate comprehension challenges at various linguistic levels.

      Minor:

      (1) Page 2, second line in Introduction, "Phonemes occur over ..." should be lowercase.

      According to APA format, the first word after the colon is capitalized if it begins a complete sentence (https://blog.apastyle.org/apastyle/2011/06/capitalization-after-colons.html). Here

      the sentence is a complete sentence so we used uppercase for “phonemes”.

      (2) Page 8, second paragraph "...-100ms to 100ms relative to sentence onsets", should it be onsets or offsets?

      This is typo and it should be offsets. We have now revised it.

      References

      Bemis, D. K., & Pylkkanen, L. (2011). Simple composition: An MEG investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8), 2801– 2814.

      Gao, C., Li, J., Chen, J., & Huang, S. (2024). Measuring meaning composition in the human brain with composition scores from large language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11295–11308). Association for Computational Linguistics.

      Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., … Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), Article 3.

      Gwilliams, L., King, J.-R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13(1), Article 1.

      Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.

      Li, J., Lai, M., & Pylkkänen, L. (2024). Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2, 1–17.

      Li, J., & Pylkkänen, L. (2021). Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30), 6526–6538.

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190.

      Schmitt, L.-M., Erb, J., Tune, S., Rysop, A. U., Hartwigsen, G., & Obleser, J. (2021). Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49), eabi6070.

      Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.

      Sugimoto, Y., Yoshida, R., Jeong, H., Koizumi, M., Brennan, J. R., & Oseki, Y. (2024). Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. Neurobiology of Language, 5(1), 201–224.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1 (Public Review):

      I want to reiterate my comment from the first round of reviews: that I am insufficiently familiar with the intricacies of Maxwell’s equations to assess the validity of the assumptions and the equations being used by WETCOW. The work ideally needs assessing by someone more versed in that area, especially given the potential impact of this method if valid.

      We appreciate the reviewer’s candor. Unfortunately, familiarity with Maxwell’s equations is an essential prerequisite for assessing the veracity of our approach and our claims.

      Effort has been made in these revisions to improve explanations of the proposed approach (a lot of new text has been added) and to add new simulations. However, the authors have still not compared their method on real data with existing standard approaches for reconstructing data from sensor to physical space. Refusing to do so because existing approaches are deemed inappropriate (i.e. they “are solving a different problem”) is illogical.

      Without understanding the importance of our model for brain wave activity (cited in the paper) derived from Maxwell’s equations in inhomogeneous and anisotropic brain tissue, it is not possible to critically evaluate the fundamental difference between our method and the standard so-called “source localization” method which the Reviewer feels it is important to compare our results with. Our method is not “source localization” which is a class of techniques based on an inappropriate model for static brain activity (static dipoles sprinkled sparsely in user-defined areas of interest). Just because a method is “standard” does not make it correct. Rather, we are reconstructing a whole brain, time dependent electric field potential based upon a model for brain wave activity derived from first principles. It is comparing two methods that are “solving different problems” that is, by definition, illogical.

      Similarly, refusing to compare their method with existing standard approaches for spatio-temporally describing brain activity, just because existing approaches are deemed inappropriate, is illogical.

      Contrary to the Reviewer’s assertion, we do compare our results with three existing methods for describing spatiotemporal variations of brain activity.

      First, Figures 1, 2, and 6 compare the spatiotemporal variations in brain activity between our method and fMRI, the recognized standard for spatiotemporal localization of brain activity. The statistical comparison in Fig 3 is a quantitative demonstration of the similarity of the activation patterns. It is important to note that these data are simultaneous EEG/fMRI in order to eliminate a variety of potential confounds related to differences in experimental conditions.

      Second, Fig 4 (A-D) compares our method with the most reasonable “standard” spatiotemporal localization method for EEG: mapping of fields in the outer cortical regions of the brain detected at the surface electrodes to the surface of the skull. The consistency of both the location and sign of the activity changes detected by both methods in a “standard” attention paradigm is clearly evident. Further confirmation is provided by comparison of our results with simultaneous EEG/fMRI spatial reconstructions (E-F) where the consistency of our reconstructions between subjects is shown in Fig 5.

      Third, measurements from intra-cranial electrodes, the most direct method for validation, are compared with spatiotemporal estimates derived from surface electrodes and shown to be highly correlated.

      For example, the authors say that “it’s not even clear what one would compare [between the new method and standard approaches]”. How about:

      (1) Qualitatively: compare EEG activation maps. I.e. compare what you would report to a researcher about the brain activity found in a standard experimental task dataset (e.g. their gambling task). People simply want to be able to judge, at least qualitatively on the same data, what the most equivalent output would be from the two approaches. Note, both approaches do not need to be done at the same spatial resolution if there are constraints on this for the comparison to be useful.

      (2) Quantitatively: compare the correlation scores between EEG activation maps and fMRI activation maps

      These comparison were performed and already in the paper.

      (1) Fig 4 compares the results with a standard attention paradigm (data and interpretation from Co-author Dr Martinez, who is an expert in both EEG and attention). Additionally, Fig 12 shows detected regions of increased activity in a well-known brain circuit from an experimental task (’reward’) with data provided by Co-author Dr Krigolson, an expert in reward circuitry.

      (2) Correlation scores between EEG and fMRI are shown in Fig 3.

      (3) Very high correlation between the directly measured field from intra-cranial electrodes in an epilepsy patient and those estimated from only the surface electrodes is shown in Fig 9.

      There are an awful lot of typos in the new text in the paper. I would expect a paper to have been proof read before submitting.

      We have cleaned up the typos.

      The abstract claims that there is a “direct comparison with standard state-of-the-art EEG analysis in a well-established attention paradigm”, but no actual comparison appears to have been completed in the paper.

      On the contrary, as mentioned above, Fig 4 compares the results of our method with the state-of-the-art surface spatial mapping analysis, with the state-of-the-art time-frequency analysis, and with the state-of-the-art fMRI analysis

      Reviewer 2 (Public Review):

      This is a major rewrite of the paper. The authors have improved the discourse vastly.

      There is now a lot of didactics included but they are not always relevant to the paper.

      The technique described in the paper does in fact leverage several novel methods we have developed over the years for analyzing multimodal space-time imaging data. Each of these techniques has been described in detail in separate publications cited in the current paper. However, the Reviewers’ criticisms stated that the methods were non-standard and they were unfamiliar with them. In lieu of the Reviewers’ reading the original publications, we added a significant amount of text indeed intended to be didactic. However, we can assume the Reviewer that nothing presented was irrelevant to the paper. We certainly had no desire to make the paper any longer than it needed to be.

      The section on Maxwell’s equation does a disservice to the literature in prior work in bioelectromagnetism and does not even address the issues raised in classic text books by Plonsey et al. There is no logical “backwardness” in the literature. They are based on the relative values of constants in biological tissues.

      This criticism highlights the crux of our paper. Contrary to the assertion that we have ignored the work of Plonsey, we have referenced it in the new additional text detailing how we have constructed Maxwell’s Equations appropriate for brain tissue, based on the model suggested by Plonsey that allows the magnetic field temporal variations to be ignored but not the time-dependence electric fields.

      However, the assumption ubiquitous in the vast prior literature of bioelectricity in the brain that the electric field dynamics can be “based on the relative values of constants in biological tissues”, as the Reviewer correctly summarizes, is precisely the problem. Using relative average tissue properties does not take into account the tissue anisotropy necessary to properly account for correct expressions for the electric fields. As our prior publications have demonstrated in detail, taking into account the inhomogeneity and anisotropy of brain tissue in the solution to Maxwell’s Equations is necessary for properly characterizing brain electrical fields, and serves as the foundation of our brain wave theory. This led to the discovery of a new class of brain waves (weakly evanescent transverse cortical waves, WETCOW).

      It is this brain wave model that is used to estimate the dynamic electric field potential from the measurements made by the EEG electrode array. The standard model that ignores these tissue details leads to the ubiquitous “quasi-static approximation” that leads to the conclusion that the EEG signal cannot be spatial reconstructed. It is indeed this critical gap in the existing literature that is the central new idea in the paper.

      There are reinventions of many standard ideas in terms of physics discourses, like Bayesian theory or PCA etc.

      The discussion of Bayesian theory and PCA is in response to the Reviewer complaint that they were unfamiliar with our entropy field decomposition (EFD) method and the request that we compare it with other “standard” methods. Again, we have published extensively on this method (as referenced in the manuscript) and therefore felt that extensive elaboration was unnecessary. Having been asked to provide such elaboration and then being pilloried for it therefore feels somewhat inappropriate in our view. This is particularly disappointing as the Reviewer claims we are presenting “standard” ideas when in fact the EFD is new general framework we developed to overcome the deficiencies in standard “statistical” and probabilistic data analysis methods that are insufficient for characterizing non-linear, nonperiodic, interacting fields that are the rule, rather than the exception, in complex dynamical systems, such as brain electric fields (or weather, or oceans, or ....).

      The EFD is indeed a Bayesian framework, as this is the fundamental starting point for probability theory, but it is developed in a unique and more general fashion than previous data analysis methods. (Again, this is detailed in several references in the papers bibliography. The Reviewer’s requested that an explanation be included in the present paper, however, so we did so). First, Bayes Theorem is expressed in terms of a field theory that allows an arbitrary number of field orders and coupling terms. This generality comes with a penalty, which is that it’s unclear how to assess the significance of the essentially infinite number of terms. The second feature is the introduction of a method by which to determine the significant number of terms automatically from the data itself, via the our theory of entropy spectrum pathways (ESP), which is also detailed in a cited publication, and which produces ranked spatiotemporal modes from the data. Rather than being “reinventions of many standard ideas” these are novel theoretical and computational methods that are central to the EEG reconstruction method presented in the paper.

      I think that the paper remains quite opaque and many of the original criticisms remain, especially as they relate to multimodal datasets. The overall algorithm still remains poorly described. benchmarks.

      It’s not clear how to assess the criticisms that the algorithm is poorly described yet there is too much detail provided that is mistakenly assessed as “standard”. Certainly the central wave equations that are estimated from the data are precisely described, so it’s not clear exactly what the Reviewer is referring to.

      The comparisons to benchmark remain unaddressed and the authors state that they couldn’t get Loreta to work and so aborted that. The figures are largely unaltered, although they have added a few more, and do not clearly depict the ideas. Again, no benchmark comparisons are provided to evaluate the results and the performance in comparison to other benchmarks.

      As we have tried to emphasize in the paper, and in the Response to Reviewers, the standard so-called “source localization” methods are NOT a benchmark, as they are solving an inappropriate model for brain activity. Once again, static dipole “sources” arbitrarily sprinkled on pre-defined regions of interest bear little resemblance to observed brain waves, nor to the dynamic electric field wave equations produced by our brain wave theory derived from a proper solution to Maxwell’s equations in the anisotropic and inhomogeneous complex morphology of the brain.

      The comparison with Loreta was not abandoned because we couldn’t get it to work, but because we could not get it to run under conditions that were remotely similar to whole brain activity described by our theory, or, more importantly, by an rationale theory of dynamic brain activity that might reproduce the exceedingly complex electric field activity observed in numerous neuroscience experiments.

      We take issue with the rather dismissive mention of “a few more” figures that “do not clearly depict the idea” when in fact the figures that have been added have demonstrated additional quantitative validation of the method.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      The paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and also superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell’s equations.<br /> The proposed method brings together some very interesting ideas, and the potential impact is high. However, the work does not provide the evaluations expected when validating a new source reconstruction approach. I cannot judge the success or impact of the approach based on the current set of results. This is very important to rectify, especially given that the work is challenging some long- standing and fundamental assumptions made in the field.

      We appreciate the Reviewer’s efforts in reviewing this paper and have included a significant amount of new text to address their concerns.

      I also find that the clarity of the description of the methods, and how they link to what is shown in the main results hard to follow.

      We have added significantly more detail on the methods, including more accessible explanations of the technical details, and schematic diagrams to visualize the key processing components.

      I am insufficiently familiar with the intricacies of Maxwell’s equations to assess the validity of the assumptions and the equations being used by WETCOW. The work therefore needs assessing by someone more versed in that area. That said, how do we know that the new terms in Maxwell’s equations, i.e. the time-dependent terms that are normally missing from established quasi-static-based approaches, are large enough to need to be considered? Where is the evidence for this?

      The fact that the time-dependent terms are large enough to be considered is essentially the entire focus of the original papers [7,8]. Time-dependent terms in Maxwell’s equations are generally not important for brain electrodynamics at physiological frequencies for homogeneous tissues, but this is not true for areas with stroung inhomogeneity and ansisotropy.

      I have not come across EFD, and I am not sure many in the EEG field will have. To require the reader to appreciate the contributions of WETCOW only through the lens of the unfamiliar (and far from trivial) approach of EFD is frustrating. In particular, what impact do the assumptions of WETCOW make compared to the assumptions of EFD on the overall performance of SPECTRE?

      We have added an entire new section in the Appendix that provides a very basic introduction to EFD and relates it to more commonly known methods, such as Fourier and Independent Components Analyses.

      The paper needs to provide results showing the improvements obtained when WETCOW or EFD are combined with more established and familiar approaches. For example, EFD can be replaced by a first-order vector autoregressive (VAR) model, i.e. y<sub>t</sub> = Ay<sub>t−1</sub> + e<sub>t</sub> (where y<sub>t</sub> is [num<sub>gridpoints</sub> ∗ 1] and A is [num<sub>gridpoints</sub> ∗ num<sub>gridpoints</sub>] of autoregressive parameters).

      The development of EFD, which is independent of WETCOW, stemmed from the necessity of developing a general method for the probabilistic analysis of finitely sampled non-linear interacting fields, which are ubiquitous in measurements of physical systems, of which functional neuroimaging data (fMRI, EEG) are excellent examples. Standard methods (such as VAR) are inadequate in such cases, as discussed in great detail in our EFD publications (e.g., [12,37]). The new appendix on EFD reviews these arguments. It does not make sense to compare EFD with methods which are inappropriate for the data.

      The authors’ decision not to include any comparisons with established source reconstruction approaches does not make sense to me. They attempt to justify this by saying that the spatial resolution of LORETA would need to be very low compared to the resolution being used in SPECTRE, to avoid compute problems. But how does this stop them from using a spatial resolution typically used by the field that has no compute problems, and comparing with that? This would be very informative. There are also more computationally efficient methods than LORETA that are very popular, such as beamforming or minimum norm.

      he primary reason for not comparing with ’source reconstruction’ (SR) methods is that we are are not doing source reconstruction. Our view of brain activity is that it involves continuous dynamical non-linear interacting fields througout the entire brain. Formulating EEG analysis in terms of reconstructing sources is, in our view, like asking ’what are the point sources of a sea of ocean waves’. It’s just not an appropriate physical model. A pre-chosen limited distribution of static dipoles is just a very bad model for brain activity, so much so that it’s not even clear what one would compare. Because in our view, as manifest in our computational implementation, one needs to have a very high density of computational locations throughout the entire brain, including white matter, and the reconstructed modes are waves whose extent can be across the entire brain. Our comments about the low resolution of computational methods for SR techniques really is expressing the more overarching concern that they are not capable of, or even designed for, detecting time-dependent fields of non-linear interacting waves that exist everywhere througout the brain. Moreover, the SR methods always give some answer, but in our view the initial conditions upon which those methods are based (pre-selected regions of activity with a pre-selected number of ’sources’) is a highly influential but artificial set of strong computational constraints that will almost always provide an answer consist with (i.e., biased toward) the expectations of the person formlating the problem, and is therefore potentially misleading.

      In short, something like the following methods needs to be compared:

      (1) Full SPECTRE (EFD plus WETCOW)

      (2) WETCOW + VAR or standard (“simple regression”) techniques

      (3) Beamformer/min norm plus EFD

      (4) Beamformer/min norm plus VAR or standard (“simple regression”) techniques

      The reason that no one has previously ever been able to solve the EEG inverse problem is due to the ubiquitous use of methods that are too ’simple’, i.e., are poor physical models of brain activity. We have spent a decade carefully elucidating the details of this statement in numerous highly technical and careful publications. It therefore serves no purpose to return to the use of these ’simple’ methods for comparison. We do agree, however, that a clearer overview of the advantages of our methods is warranted and have added significant additional text in this revision towards that purpose.

      This would also allow for more illuminating and quantitative comparisons of the real data. For example, a metric of similarity between EEG maps and fMRI can be computed to compare the performance of these methods. At the moment, the fMRI-EEG analysis amounts to just showing fairly similar maps.

      We disagree with this assessment. The correlation coefficient between the spatially localized activation maps is a conservative sufficient statistic for the measure of statistically significant similarity. These numbers were/are reported in the caption to Figure 5, and have now also been moved to, and highlighted in, the main text.

      There are no results provided on simulated data. Simulations are needed to provide quantitative comparisons of the different methods, to show face validity, and to demonstrate unequivocally the new information that SPECTRE can ’potentially’ provide on real data compared to established methods. The paper ideally needs at least 3 types of simulations, where one thing is changed at a time, e.g.:

      (1) Data simulated using WETCOW plus EFD assumptions

      (2) Data simulated using WETCOW plus e.g. VAR assumptions

      (3) Data simulated using standard lead fields (based on the quasi-static Maxwell solutions) plus e.g. VAR assumptions

      These should be assessed with the multiple methods specified earlier. Crucially the assessment should be quantitative showing the ability to recover the ground truth over multiple realisations of realistic noise. This type of assessment of a new source reconstruction method is the expected standard

      We have now provided results on simulated data, along with a discussion on what entails a meaningful simulation comparison. In short, our original paper on the WETCOW theory included a significant number of simulations of predicted results on several spatial and temporal scales. The most relevant simulation data to compare with the SPECTRE imaging results are the cortical wave loop predicted by WETCOW theory and demonstrated via numerical simulation in a realistic brain model derived from high resolution anatomical (HRA) MRI data. The most relevant data with which to compare these simulations are the SPECTRE recontruction from the data that provides the closest approximation to a “Gold Standard” - reconstructions from intra-cranial EEG (iEEG). We have now included results (new Fig 8) that demonstrate the ability of SPECTRE to reconstruct dynamically evolving cortical wave loops in iEEG data acquired in an epilepsy patient that match with the predicted loop predicted theoretically by WETCOW and demonstrated in realistic numerical simulations.

      The suggested comparison with simple regression techniques serves no purpose, as stated above, since that class of analysis techniques was not designed for non-linear, non-Gaussian, coupled interacting fields predicted by the WETCOW model. The explication of this statement is provided in great detail in our publications on the EFD approach and in the new appendix material provided in this revision. The suggested simulation of the dipole (i.e., quasi-static) model of brain activity also serves no purpose, as our WETCOW papers have demonstrated in great detail that is is not a reasonable model for dynamic brain activity.

      Reviewer 2 (Public Review):

      Strengths:

      If true and convincing, the proposed theoretical framework and reconstruction algorithm can revolutionize the use of EEG source reconstructions.

      Weaknesses:

      There is very little actual information in the paper about either the forward model or the novel method of reconstruction. Only citations to prior work by the authors are cited with absolutely no benchmark comparisons, making the manuscript difficult to read and interpret in isolation from their prior body of work.

      We have now added a significant amount of material detailing the forward model, our solution to the inverse problem, and the method of reconstruction, in order to remedy this deficit in the previous version of the paper.

      Recommendations for the authors:

      Reviewer 1 (Recommendations):

      It is not at all clear from the main text (section 3.1) and the caption, what is being shown in the activity patterns in Figures 1 and 2. What frequency bands and time points etc? How are the values shown in the figures calculated from the equations in the methods?

      We have added detailed information on the frequency bands reconstructed and the activity pattern generation and meaning. Additional information on the simultaneous EEG/fMRI acquisition details has been added to the Appendix.

      How have the activity maps been thresholded? Where are the color bars in Figures 1 and 2?

      We have now included that information in new versions of the figures. In addition, the quantitative comparison between fMRI and EEG are presented is now presented in a new Figure 2 (now Figure 3).

      P30 “This term is ignored in the current paper”. Why is this term ignored, but other (time-dependent) terms are not?

      These terms are ignored because they represent higher order terms that complicate the processing (and intepretation) but do not substatially change the main results. A note to this effect has been added to the text.

      The concepts and equations in the EFD section are not very accessible (e.g. to someone unfamiliar with IFT).

      We have added a lengthy general and more accessible description of the EFD method in the Appendix.

      Variables in equation 1, and the following equation, are not always defined in a clear, accessible manner. What is ?

      We have added additional information on how Eqn 1 (now Eqn 3) is derived, and the variables therein.

      In the EFD section, what do you mean conceptually by α, i.e. “the coupled parameters α”?

      This sentence has been eliminated, as it was superfluous and confusing.

      How are the EFD and WETCOW sections linked mathematically? What is ψ (in eqn 2) linked to in the WETCOW section (presumably ϕ<sub>ω</sub>?) ?

      We have added more introductory detail at the beginning of the Results to describe the WETCOW theory and how this is related to the inverse problem for EEG.

      What is the difference between data d and signal s in section 6.1.3? How are they related?

      We have added a much more detailed Appendix A where this (and other) details are provided.

      What assumptions have been made to get the form for the information Hamiltonian in eqn3?

      Eq 3 (now Eqn A.5) is actually very general. The approximations come in when constructing the interaction Hamiltonian H<sub>i</sub>.

      P33 “using coupling between different spatio-temporal points that is available from the data itself” I do not understand what is meant by this.

      This was a poorly worded sentence, but this section has now been replaced by Appendix A, which now contains the sentence that prior information “is contained within the data itself”. This refers to the fact that the prior information consists of correlations in the data, rather than some other measurements independent of the original data. This point is emphasized because in many Bayesian application, prior information consists of knowledge of some quantity that were acquired independently from the data at hand (e.g., mean values from previous experiments)

      Reviewer 2 (Recommendations):

      Abstract

      The first part presents validation from simultaneous EEG/fMRI data, iEEG data, and comparisons with standard EEG analyses of an attention paradigm. Exactly what constitutes adequate validation or what metrics were used to assess performance is surprisingly absent.

      Subsequently, the manuscript examines a large cohort of subjects performing a gambling task and engaging in reward circuits. The claim is that this method offers an alternative to fMRI.

      Introduction

      Provocative statements require strong backing and evidence. In the first paragraph, the “quasi-static” assumption which is dominant in the field of EEG and MEG imaging is questioned with some classic citations that support this assumption. Instead of delving into why exactly the assumption cannot be relaxed, the authors claim that because the assumption was proved with average tissue properties rather than exact, it is wrong. This does not make sense. Citations to the WETCOW papers are insufficient to question the quasi-static assumption.

      The introduction purports to validate a novel theory and inverse modeling method but poorly outlines the exact foundations of both the theory (WETCOW) and the inverse modeling (SPECTRE) work.

      We have added a new introductory subsection (“A physical theory of brain waves”) to the Results section that provides a brief overview of the foundations of the WETCOW theory and an explicit description of why the quasi-static approximation can be abandoned. We have expanded the subsequent subsection (“Solution to the inverse EEG problem”) to more clearly detail the inverse modeling (SPECTRE) method.

      Section 3.2 Validation with fMRI

      Figure 1 supposedly is a validation of this promising novel theoretical approach that defies the existing body of literature in this field. Shockingly, a single subject data is shown in a qualitative manner with absolutely no quantitative comparison anywhere to be found in the manuscript. While there are similarities, there are also differences in reconstructions. What to make out of these discrepancies? Are there distortions that may occur with SPECTRE reconstructions? What are its tradeoffs? How does it deal with noise in the data?

      It is certainly not the case that there are no quantitative comparisons. Correlation coefficients, which are the sufficient statistics for comparison of activation regions, are given in Figure 5 for very specific activation regions. Figure 9 (now Figure 11) shows a t-statistic demonstrating the very high significance of the comparison between multiple subjects. And we have now added a new Figure 7 demonstrating the strongly correlated estimates for full vs surface intra-cranial EEG reconstructions. To make this more clear, we have added a new section “Statistical Significance of the Results”.

      We note that a discussion of the discrepancies between fMRI and EEG was already presented in the Supplementary Material. Therein we discuss the main point that fMRI and EEG are measuring different physical quantities and so should not be expected to be identical. We also highlight the fact that fMRI is prone to significant geometrical distortions for magnetic field inhomogeities, and to physiological noise. To provide more visibility for this important issue, we have moved this text into the Discussion section.

      We do note that geometric distortions in fMRI data due to suboptimal acquisitions and corrections is all too common. This, coupled with the paucity of open source simultaneous fMRI-EEG data, made it difficult to find good data for comparison. The data on which we performed the quantitative statistical comparison between fMRI and EEG (Fig 5) was collected by co-author Dr Martinez, and was of the highest quality and therefore sufficient for comparison. The data used in Fig 1 and 2 was a well publicized open source dataset but had significant fMRI distortions that made quantitative comparison (i.e., correlation coefficents between subregions in the Harvard-Oxford atlas) suboptimal. Nevertheless, we wanted to demonstrate the method in more than one source, and feel that visual similarity is a reasonble measure for this data.

      Section 3.2 Validation with fMRI

      Figure 2 Are the sample slices being shown? How to address discrepancies? How to assume that these are validations when there are such a level of discrepancies?

      It’s not clear what “sample slices” means. The issue of discrepancies is addressed in the response to the previous query.

      Section 3.2 Validation with fMRI

      Figure 3 Similar arguments can be made for Figure 3. Here too, a comparison with source localization benchmarks is warranted because many papers have examined similar attention data.

      Regarding the fMRI/EEG comparison, these data are compared quantitatively in the text and in Figure 5.

      Regarding the suggestion to perform standard ’source localization’ analysis, see responses to Reviewer 1.

      Section 3.2 Validation with fMRI

      Figure 4 While there is consistency across 5 subjects, there are also subtle and not-so-subtle differences.

      What to make out of them?

      Discrepancies in activations patterns between individuals is a complex neuroscience question that we feel is well beyond the scope of this paper.

      Section 3.2 Validation with fMRI

      Figures 5 & 6 Figure 5 is also a qualitative figure from two subjects with no appropriate quantification of results across subjects. The same is true for Figure 6.

      On the contrary, Figure 5 contains a quantitative comparison, which is now also described in the text. A quantitative comparison for the epilepsy data in Fig 6 (and C.4-C.6) is now shown in Fig 7.

      Section 3.2 Validation with fMRI

      Given the absence of appropriate “validation” of the proposed model and method, it is unclear how much one can trust results in Section 4.

      We believe that the quantitative comparisons extant in the original text (and apparently missed by the Reviewer) along with the additional quantitative comparisons are sufficient to merit trust in Section 4.

      Section 3.2 Validation with fMRI

      What are the thresholds used in maps for Figure 7? Was correction for multiple comparisons performed? The final arguments at the end of section 4 do not make sense. Is the claim that all results of reconstructions from SPECTRE shown here are significant with no reason for multiple comparison corrections to control for false positives? Why so?

      We agree that the last line in Section 4 is misleading and have removed it.

      Section 3.2 Validation with fMRI

      Discussion is woefully inadequate in addition to the inconclusive findings presented here.

      We have added a significant amount of text to the Discussion to address the points brought up by the Reviewer. And, contrary to the comments of this Reviewer, we believe the statistically significant results presented are not “inconclusive”.

      Supplementary Materials

      This reviewer had an incredibly difficult time understanding the inverse model solution. Even though this has been described in a prior publication by the authors, it is important and imperative that all details be provided here to make the current manuscript complete. The notation itself is so nonstandard. What is Σ<sup>ij</sup>, δ<sup>ij</sup>? Where is the reference for equation (1)? What about the equation for <sup>ˆ</sup>(R)? There are very few details provided on the exact implementation details for the Fourier-space pseudo-spectral approach. What are the dimensions of the problem involved? How were different tissue compartments etc. handled? Equation 1 holds for the entire volume but the measurements are only made on the surface. How was this handled? What is the WETCOW brain wave model? I don’t see any entropy term defined anywhere - where is it?

      We have added more detail on the theoretical and numerical aspects of the inverse problem in two new subsections “Theory” and “Numerical Implementation” in the new section “Solution to the inverse EEG problem”.

      Supplementary Materials

      So, how can one understand even at a high conceptual level what is being done with SPECTRE?

      We have added a new subsection “Summary of SPECTRE” that provides a high conceptual level overview of the SPECTRE method outlined in the preceding sections.

      Supplementary Materials

      In order to understand what was being presented here, it required the reader to go on a tour of the many publications by the authors where the difficulty in understanding what they actually did in terms of inverse modeling remains highly obscure and presents a huge problem for replicability or reproducibility of the current work.

      We have now included more basic material from our previous papers, and simplified the presentation to be more accessible. In particular, we have now moved the key aspects of the theoretic and numerical methods, in a more readable form, from the Supplementary Material to the main text, and added a new Appendix that provides a more intuitive and accessible overview of our estimation procedures.

      Supplementary Materials

      How were conductivity values for different tissue types assigned? Is there an assumption that the conductivity tensor is the same as the diffusion tensor? What does it mean that “in the present study only HRA data were used in the estimation procedure?” Does that mean that diffusion MRI data was not used? What is SYMREG? If this refers to the MRM paper from the authors in 2018, that paper does not include EEG data at all. So, things are unclear here.

      The conductivity tensor is not exactly the same as the diffusion tensor in brain tissues, but they are closely related. While both tensors describe transport properties in brain tissue, they represent different physical processes. The conductivity tensor is often assumed to share the same eigenvectors as the diffusion tensor. There is a strong linear relationship between the conductivity and diffusion tensor eigenvalues, as supported by theoretical models and experimental measurements. For the current study we only used the anatomical data for estimatition and assignment of different tissue types and no diffusion MRI data was used. To register between different modalities, including MNI, HRA, function MRI, etc., and to transform the tissue assignment into an appropriate space we used the SYMREG registration method. A comment to the effect has been added to the text.

      Supplementary Materials

      How can reconstructed volumetric time-series of potential be thought of as the EM equivalent of an fMRI dataset? This sentence doesn’t make sense.

      This sentence indeed did not make sense and has been removed.

      Supplementary Materials

      Typical Bayesian inference does not include entropy terms, and entropy estimation doesn’t always lend to computing full posterior distributions. What is an “entropy spectrum pathway”? What is µ∗? Why can’t things be made clear to the reader, instead of incredible jargon used here? How does section 6.1.2 relate back to the previous section?

      That is correct that Bayesian inference typically does not include entropy terms. We believe that their introduction via the theory of entropy spectrum pathways (ESP) is a significant advance in Bayesian estimation as it provides highly relevent prior information from within the data itself (and therefore always available in spatiotemporal data) that facilitates a practical methodology for the analysis of complex non-linear dynamical system, as contained in the entropy field decomposition (EFD).

      Section 6.1.3 has now been replaced by a new Appendix A that discusses ESP in a much more intuitive and conceptual manner.

      Supplementary Materials

      Section 6.1.3 describes entropy field decomposition in very general terms. What is “non-period”? This section is incomprehensible. Without reference to exactly where in the process this procedure is deployed it is extremely difficult to follow. There seems to be an abuse of notation of using ϕ for eigenvectors in equation (5) and potentials earlier. How do equations 9-11 relate back to the original problem being solved in section 6.1.1? What are multiple modalities being described here that require JESTER?

      Section 6.1.3 has now been replaced by a new Appendix A that covers this material in a much more intuitive and conceptual manner.

      Supplementary Materials

      Section 6.3 discusses source localization methods. While most forward lead-field models assume quasistatic approximations to Maxwell’s equations, these are perfectly valid for the frequency content of brain activity being measured with EEG or MEG. Even with quasi-static lead fields, the solutions can have frequency dependence due to the data having frequency dependence. Solutions do not have to be insensitive to detailed spatially variable electrical properties of the tissues. For instance, if a FEM model was used to compute the forward model, this model will indeed be sensitive to the spatially variable and anisotropic electrical properties. This issue is not even acknowledged.

      The frequency dependence of the tissue properties is not the issue. Our theoretical work demonstrates that taking into account the anisotropy and inhomogeneity of the tissue is necessary in order to derive the existence of the weakly evanescent transverse cortical waves (WETCOW) that SPECTRE is detecting. We have added more details about the WETCOW model in the new Section “A physical theory of brain wave” to emphasize this point.

      Supplementary Materials

      Arguments to disambiguate deep vs shallow sources can be achieved with some but not all source localization algorithms and do not require a non-quasi-static formulation. LORETA is not even the main standard algorithm for comparison. It is disappointing that there are no comparisons to source localization and that this is dismissed away due to some coding issues.

      Again, we are not doing ’source localization’. The concept of localized dipole sources is anathema to our brain wave model, and so in our view comparing SPECTRE to such methods only propagates the misleading idea that they are doing the same thing. So they are definitely not dismissed due to coding issues. However, because of repeated requests to do compare SPECTRE with such methods, we attempted to run a standard source localization method with parameters that would at least provide the closest approximation to what we were doing. This attempt highlighted a serious computational issue in source localization methods that is a direct consequence of the fact that they are not attempting to do what SPECTRE is doing - describing a time-varying wave field, in the technical definition of a ’field’ as an object that has a value at every point in space-time.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Evaluations:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures and analyses are solid. The findings are interesting and novel.

      In the original submission, it was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified. However, this concern has been satisfactorily addressed in the revision.

      We thank the reviewer for his/her positive evaluation and thoughtful comments. 

      Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (That is moving dots in one direction, followed by another direction etc.), that showing either the starting movement direction, or ending movement direction causes a coarsegrained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly-compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. The statistical methodology seems valid, but due to its complexity it is not easy to understand. The methods especially those described in figures 3 and 4 should be explained better.  

      We thank the reviewer’s detailed evaluation. As suggested, we have further revised the Methods and Results sections, particularly the descriptions related to Figures 3 and 4, to enhance clarity. Please see the revisions highlighted in red in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The most important results here are in Figure 4, and they rely on methods explained in Figure 3. Figure 4 and the results in the figure are confusing.

      What is the red bar in 4B,E. What are the units of the Y axis in figure 4B,E?

      Does sequenceness have units? How do we interpret these magnitudes apart from the line of statistical significance? Shouldn't there be two lines, one for forward replay and the other for backward replay rather than a single line with positive and negative values? The term sequnceness is defined in figure 3, and is key. The replayed sequence in figure 4A,D seems to last about 120 ms.

      What is the meaning of having significance only within a window of 28-36 ms?

      We thank the reviewer’s careful reading and insightful comments. We apologize for the lack of clarity regarding these details in the previous version. As mentioned above, we have revised the Methods and Results sections to enhance clarity throughout the manuscript. For convenience, we provide detailed explanations addressing the specific points raised by the reviewer below.

      First, the red bars in Figures 4B and 4E indicate the lags when the evidence of sequenceness surpassed the statistical significance threshold, as determined by permutation testing. We have now explicitly clarified this in the revised figure captions.

      Second, sequenceness doesn’t have units. It corresponds to the regression coefficient (β) obtained from the second-level GLM in the TDLM framework. Specifically, in the first step of TDLM, we constructed an empirical transition matrix that quantifies the evidence for all possible transitions (e.g., 0° → 90°) at each time lag (Δt). In the second step, we evaluated the extent to which each model transition matrix (e.g., forward or backward transitions) predicts the empirical transition matrix at each Δt, yielding second-level β values. Sequenceness is defined as the difference between the β values for the forward and backward transition models, reflecting the relative strength and directionality of sequential replay. As it is derived from regression coefficients, sequenceness is inherently a unitless measure.

      Regarding the interpretation of sequenceness magnitudes beyond statistical significance, the β values reflect the extent to which the model transition matrix explains variance in the empirical transition matrix. While larger β values suggest stronger sequenceness, absolute magnitudes are influenced by various factors, such as between-participant noise. Therefore, the key criterion for interpreting these values is whether they surpass permutationbased significance thresholds, which indicate that the observed sequenceness is unlikely to have occurred by chance.

      Third, as the reviewer correctly pointed out, we initially computed two separate regression lines, one for forward replay and the other for backward replay. We then defined sequenceness as the contrast between the forward and backward replay (forward minus backward). This contrast approach is commonly used in previous studies to remove between-participant variance in the sequential replay per se, which may arise due to variability in task engagement or measurement sensitivity (Liu et al., 2021; Nour et al., 2021).

      Finally, regarding the duration of replay events, the example sequences shown in Figures 4A and 4D indeed span about 120 ms in total. However, the time lag (Δt) between successive reactivation peaks within these sequences is about 30 ms. This is in line with the findings shown in Figures 4B and 4E, where statistical significance is observed at a time lag window of 28 – 36 ms on the x-axis. It is important to note that the x-axis in these plots represents the time lag (Δt) between sequential reactivations, rather than absolute time.

      We hope these clarifications address the reviewer’s concerns, and we have revised the manuscript accordingly to make these points clearer to readers.

      The methods here are not simple and not simple to explain. The new version is easier to understand. From the new version it seems that the methodology is sound. It should be still clarified and better explained.

      We have carefully revised the manuscript to better explain the methodology. We appreciate the reviewer’s feedback, which is valuable in improving the clarity of our work.

      Now that I understand what they mean by decoding probability, I think that this term is confusing or even misleading. The decoding accuracy is the probability that the direction of motion classification was correct. It seems the so-called decoding probability is value of the logistic regression after normalizing the sum to 1. If this is a standard term it can probably be kept, if not another term would be better.

      Thank you for the reviewer’s comment. We agree that the term decoding probability may initially seem confusing. However, decoding probability is a commonly used term in the neural decoding literature, particularly in human studies (e.g., Liu et al., 2019; Nour et al., 2021; Turner et al., 2023). To maintain consistency with previous work, we have kept this term in the manuscript. We appreciate the opportunity to clarify this point.

      References

      Liu, Y., Dolan, R. J., Higgins, C., Penagos, H., Woolrich, M. W., Ólafsdóttir, H. F., Barry, C., Kurth-Nelson, Z., & Behrens, T. E. (2021). Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife, 10, e66917. https://doi.org/10.7554/eLife.66917

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z., & Dolan, R. J. (2021). Impaired neural replay of inferred relationships in schizophrenia. Cell, 184(16), 4315-4328.e17. https://doi.org/10.1016/j.cell.2021.06.012

      Turner, W., Blom, T., & Hogendoorn, H. (2023). Visual Information Is Predictively Encoded in Occipital Alpha/Low-Beta Oscillations. Journal of Neuroscience, 43(30), 5537–5545. https://doi.org/10.1523/JNEUROSCI.0135-23.2023

    1. Author response:

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that for the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduce the use of decoding accuracy as an index of distractor processing. This has not been done in the SSVEP literature. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude and thereby free from the influence by the target SSVEP amplitude. This is an important point. We will provide a more thorough discussion of this point in the revised manuscript. 

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      This is an important point. We plan to follow the reviewer’s suggestion and repeat our analysis using different window sizes to test the robustness of the observed 1Hz rhythmicity. In addition, we plan to also apply the Hilbert transform to extract time-point-by-time-point amplitude envelopes, which will provide a window-free estimation of the distractor strength and further validate the presence of the low-frequency 1Hz dynamics.

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is a good point. In addition to acknowledging this in the revised manuscript, we will carry out two additional analyses to test this issue further. First, we will implement a random permutation procedure, in which the trial labels are randomly shuffled and the null-hypothesis distribution for decoding accuracy is built, and compare the decoding accuracy from the actual data to this distribution. Second, we will perform a temporal generalization analysis to examine whether the neural representations of the distractor drift over the course of an entire trial, which is 11 seconds long. Recent studies suggest that even when the stimulus stays the same, their neural representations may drift over time.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the overall target or distractor strength that matters for behavior, it is their temporal relationship that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We will stress this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We plan to normalize each time course, make them dimensionless, and then compute the temporal relations between them.   

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. We will try our best in the revision process to address the concerns.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as having been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI, given its well-known limitation in temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways where the two modalities are integrated to gain more insights not possible with either modality used alone.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz predict behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect not an artifact. However, we will be looking into this carefully and address this in the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      Thank you for the feedback! We agree that the complexity of our model can make it challenging to intuitively understand the underlying mechanisms. To address this, we have revised the manuscript to include additional simulations and clearer explanations of the mechanisms at play.

      In the revised introduction, we now explicitly state our primary aim: to assess to what extent a biophysically detailed neuron model can support the theory proposed by Tran-Van-Minh et al. and explore whether such computations can be learned by a single neuron, specifically a projection neuron in the striatum. To achieve this, we focus on several key mechanisms:

      (1) A local learning rule: We develop a learning rule driven by local calcium dynamics in the synapse and by reward signals from the neuromodulator dopamine. This plasticity rule is based on the known synaptic machinery for triggering LTP or LTD in the corticostriatal synapse onto dSPNs (Shen et al., 2008). Importantly, the rule does not rely on supervised learning paradigms and neither is a separate training and testing phase needed.

      (2) Robust dendritic nonlinearities: According to Tran-Van-Minh et al., (2015) sufficient supralinear integration is needed to ensure that e.g. two inputs (i.e. one feature combination in the NFBP, Figure 1A) on the same dendrite generate greater somatic depolarization than if those inputs were distributed across different dendrites. To accomplish this we generate sufficiently robust dendritic plateau potentials using the approach in Trpevski et al., (2023). 

      (3) Metaplasticity: Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights. This mechanism ensures that synaptic strengths remain within biologically plausible ranges during training, regardless of initial synaptic weights.

      We have also clarified our design choices and the rationale behind them, as well as restructured the interpretation of our results for greater accessibility. We hope these revisions make our approach and findings more transparent and easier to engage with for a broader audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study extends three previous lines of work:  

      (1) Prior computational/phenomenological work has shown that the presence of dendritic nonlinearities can enable single neurons to perform linearly non-separable tasks like XOR and feature binding (e.g. Tran-Van-Minh et al., Front. Cell. Neurosci., 2015).

      Prior computational and phenomenological work, such as Tran-Van-Minh et al. (Front. Cell. Neurosci., 2015), directly inspired our study, as we now explicitly state in the introduction (page 4, lines 19-22). While Tran-Van-Minh theoretically demonstrated that these principles could solve the NFBP, it remains untested to what extent this can be achieved quantitatively in biophysically detailed neuron models using biologically plausible learning rules - which is what we test here.

      (2) This study and a previous biophysical modeling study (Trpevski et al., Front. Cell. Neurosci., 2023) rely heavily on the finding from Chalifoux & Carter, J. Neurosci., 2011 that blocking glutamate transporters with TBOA increases dendritic calcium signals. The proposed model thus depends on a specific biophysical mechanism for dendritic plateau potential generation, where spatiotemporally clustered inputs must be co-activated on a single branch, and the voltage compartmentalization of the branch and the voltage-dependence of NMDARs is not enough, but additionally glutamate spillover from neighboring synapses must activate extrasynaptic NMDARs. If this specific biophysical implementation of dendritic plateau potentials is essential to the findings in this study, the authors have not made that connection clear. If it is a simple threshold nonlinearity in dendrites that is important for the model, and not the specific underlying biophysical mechanisms, then the study does not appear to provide a conceptual advance over previous studies demonstrating nonlinear feature binding with simpler implementations of dendritic nonlinearities.

      We appreciate the feedback on the hypothesized role of glutamate spillover in our model. While the current manuscript and Trpevski et al. (2023) emphasize glutamate spillover as a plausible biophysical mechanism to provide sufficiently robust and supralinear plateau potentials, we acknowledge, however, that the mechanisms of supralinearity of dendritic integration, might not depend solely on this specific mechanism in other types of neurons. In Trpevski et al (2023) we, however, realized that if we allow too ‘graded’ dendritic plateaus, using the quite shallow Mg-block reported in experiments, it was difficult to solve the NFBP. The conceptual advance of our study lies in demonstrating that sufficiently nonlinear dendritic integration is needed and that this can be accounted for by assuming spillover in SPNs—but regardless of its biophysical source (e.g. NMDA spillover, steeper NMDA Mg block activation curves or other voltage dependent conductances that cause supralinear dendritic integration)—it enables biophysically detailed neurons to solve the nonlinear feature binding problem. To address this point and clarify the generality of our conclusions, we have revised the relevant sections in the manuscript to state this explicitly.

      (3) Prior work has utilized "sliding-threshold," BCM-like plasticity rules to achieve neuronal selectivity and stability in synaptic weights. Other work has shown coordinated excitatory and inhibitory plasticity. The current manuscript combines "metaplasticity" at excitatory synapses with suppression of inhibitory strength onto strongly activated branches. This resembles the lateral inhibition scheme proposed by Olshausen (Christopher J. Rozell, Don H. Johnson, Richard G. Baraniuk, Bruno A. Olshausen; Sparse Coding via Thresholding and Local Competition in Neural Circuits. Neural Comput 2008; 20 (10): 2526-2563. doi: https://doi.org/10.1162/neco.2008.03-07-486). However, the complexity of the biophysical model makes it difficult to evaluate the relative importance of the additional complexity of the learning scheme.

      We initially tried solving the NFBP with only excitatory plasticity, which worked reasonably well, especially if we assume a small population of neurons collaborates under physiological conditions. However, we observed that plateau potentials from distally located inputs were less effective, and we now explain this limitation in the revised manuscript (page 14, lines 23-37).

      To address this, we added inhibitory plasticity inspired by mechanisms discussed in Castillo et al. (2011) , Ravasenga et al., and Chapman et al. (2022) , as now explicitly stated in the text (page 32, lines 23-26). While our GABA plasticity rule is speculative, it demonstrates that distal GABAergic plasticity can enhance nonlinear computations. These results are particularly encouraging, as it shows that implementing these mechanisms at the single-neuron level produces behavior consistent with network-level models like BCM-like plasticity rules and those proposed by Rozell et al. We hope this will inspire further experimental work on inhibitory plasticity mechanisms.

      P2, paragraph 2: Grammar: "multiple dendritic regions, preferentially responsive to different input values or features, are known to form with close dendritic proximity." The meaning is not clear. "Dendritic regions" do not "form with close dendritic proximity."

      Rewritten (current page 2, line 35)

      P5, paragraph 3: Grammar: I think you mean "strengthened synapses" not "synapses strengthened".

      Rewritten (current page 14, line 36)

      P8, paragraph 1: Grammar: "equally often" not "equally much".

      Updated (current page 10, line 2)

      P8, paragraph 2: "This is because of the learning rule that successively slides the LTP NMDA Ca-dependent plasticity kernel over training." It is not clear what is meant by "sliding," either here or in the Methods. Please clarify.

      We have updated the text and removed the word “sliding” throughout the manuscript to clarify that the calcium dependence of the kernels are in fact updated

      P10, Figure 3C (left): After reading the accompanying text on P8, para 2, I am left not understanding what makes the difference between the two groups of synapses that both encode "yellow," on the same dendritic branch (d1) (so both see the same plateau potentials and dopamine) but one potentiates and one depresses. Please clarify.

      Some "yellow" and "banana" synapses are initialized with weak conductances, limiting their ability to learn due to the relatively slow dynamics of the LTP kernel. These weak synapses fail to reach the calcium thresholds necessary for potentiation during a dopamine peak, yet they remain susceptible to depression under LTD conditions. Initially, the dynamics of the LTP kernel does not allow significant potentiation, even in the presence of appropriate signals such as plateau potentials and dopamine (page 10, lines 22–26). We have added a more detailed explanation of how the learning rule operates in the section “Characterization of the Synaptic Plasticity Rule” on page 9 and have clarified the specific reason why the weaker yellow synapses undergo LTD (page 11, lines 1–7).

      As shown in Supplementary Figure 6, during subthreshold learning, the initial conductance is also low, which similarly hinders the synapses' ability to potentiate. However, with sufficient dopamine, the LTP kernel adapts by shifting closer to the observed calcium levels, allowing these synapses to eventually strengthen. This dynamic highlights how the model enables initially weak synapses to "catch up" under consistent activation and favorable dopaminergic conditions.

      P9, paragraph 1: The phrase "the metaplasticity kernel" is introduced here without prior explanation or motivation for including this level of complexity in the model. Please set it up before you use it.

      A sentence introducing metaplasticity has been added to the introduction (page 3, lines 36-42) as well as on page 9, where the kernel is introduced (page 9, lines 26-35)

      P10, Figure 3D: "kernel midline" is not explained.

      We have replotted fig 3 to make it easier to understand what is shown. Also, an explanation of the Kernel midpoint is added to the legend (current page 12, line 19)

      P11, paragraph 1; P13, Fig. 4C: My interpretation of these data is that clustered connectivity with specific branches is essential for the performance of the model. Randomly distributing input features onto branches (allowing all 4 features to innervate single branches) results in poor performance. This is bad, right? The model can't learn unless a specific pre-wiring is assumed. There is not much interpretation provided at this stage of the manuscript, just a flat description of the result. Tell the reader what you think the implications of this are here.

      Thanks for the suggestion - we have updated this section of the manuscript, adding an interpretation of the results that the model often fails to learn both relevant stimuli if all four features are clustered onto the same dendrite (page 13, lines 31-42). 

      In summary, when multiple feature combinations are encoded in the same dendrite with similar conductances, the ability to determine which combination to store depends on the dynamics of the other dendrite. Small variations in conductance, training order, or other stochastic factors can influence the outcome. This challenge, known as the symmetry-breaking problem, has been previously acknowledged in abstract neuron models (Legenstein and Maass, 2011). To address this, additional mechanisms such as branch plasticity—amplifying or attenuating the plateau potential as it propagates from the dendrite to the soma—can be employed (Legenstein and Maass, 2011). 

      P12, paragraph 2; P13, Figure 4E: This result seems suboptimal, that only synapses at a very specific distance from the soma can be used to effectively learn to solve a NFBP. It is not clear to what extent details of the biophysical and morphological model are contributing to this narrow distance-dependence, or whether it matches physiological data.

      We have added Figure 5—figure supplement 1A to clarify why distal synapses may not optimally contribute to learning. This figure illustrates how inhibitory plasticity improves performance by reducing excessive LTD at distal dendrites, thereby enhancing stimulus discrimination. Relevant explanations have been integrated into Page 18, Lines 25-39 in the revised manuscript.

      P14, paragraph 2: Now the authors are assuming that inhibitory synapses are highly tuned to stimulus features. The tuning of inhibitory cells in the hippocampus and cortex is controversial but seems generally weaker than excitatory cells, commensurate with their reduced number relative to excitatory cells. The model has accumulated a lot of assumptions at this point, many without strong experimental support, which again might make more sense when proposing a new theory, but this stitching together of complex mechanisms does not provide a strong intuition for whether the scheme is either biologically plausible or performant for a general class of problem.

      We acknowledge that it is not currently known whether inhibitory synapses in the striatum are tuned to stimulus features. However, given that the striatum is a purely inhibitory structure, it is plausible that lateral inhibition from other projection neurons could be tuned to features, even if feedforward inhibition from interneurons is not. Therefore, we believe this assumption is reasonable in the context of our model. As noted earlier, the GABA plasticity rule in our study is speculative. However, we hope that our work will encourage further experimental investigations, as we demonstrate that if GABAergic inputs are sufficiently specific, they can significantly enhance computations (This is discussed on page 17, lines 8-15.).

      P16, Figure 5E legend: The explanation of the meaning of T_max and T_min in the legend and text needs clarification.

      The abbreviations  T<sub>min</sub> and  T<sub>max</sub> have been updated to CTL and CTH to better reflect their role in calcium threshold tracking. The Figure 5E legend and relevant text have been revised for clarity. Additionally, the Methods section has been reorganized for better readability.

      P16, Figure 5B, C: When the reader reaches this paper, the conundrums presented in Figure 4 are resolved. The "winner-takes-all" inhibitory plasticity both increases the performance when all features are presented to a single branch and increases the range of somatodendritic distances where synapses can effectively be used for stimulus discrimination. The problem, then, is in the narrative. A lot more setup needs to be provided for the question related to whether or not dendritic nonlinearity and synaptic inhibition can be used to perform the NFBP. The authors may consider consolidating the results of Fig. 4 and 5 so that the comparison is made directly, rather than presenting them serially without much foreshadowing.

      In order to facilitate readability, we have updated the following sections of the manuscript to clarify how inhibitory plasticity resolves challenges from Figure 4:

      Figure 5B and Figure 5–figure supplement 1B: Two new panels illustrate the role of inhibitory plasticity in addressing symmetry problems.

      Figure 5–figure supplement 1A: Shows how inhibitory plasticity extends the effective range of somatodendritic distances.

      P18, Figure 6: This should be the most important figure, finally tying in all the previous complexity to show that NFBP can be partially solved with E and I plasticity even when features are distributed randomly across branches without clustering. However, now bringing in the comparison across spillover models is distracting and not necessary. Just show us the same plateau generation model used throughout the paper, with and without inhibition.

      Figure updated. Accumulative spillover and no-spillover conditions have been removed.

      P18, paragraph 2: "In Fig. 6C, we report that a subset of neurons (5 out of 31) successfully solved the NFBP." This study could be significantly strengthened if this phenomenon could (perhaps in parallel) be shown to occur in a simpler model with a simpler plateau generation mechanism. Furthermore, it could be significantly strengthened if the authors could show that, even if features are randomly distributed at initialization, a pruning mechanism could gradually transition the neuron into the state where fewer features are present on each branch, and the performance could approach the results presented in Figure 5 through dynamic connectivity.

      To model structural plasticity is a good suggestion that should be investigated in later work, however, we feel that it goes beyond what we can do in the current manuscript.  We now acknowledge that structural plasticity might play a role. For example we show that if we can assume ‘branch-specific’ spillover, that leads to sufficiently development of local dendritic non-linearities, also one can learn with distributed inputs. In reality, structural plasticity is likely important here, as we now state (current page 22, line 35-42). 

      P17, paragraph 2: "As shown in Fig. 6B, adding the hypothetical nonlinearities to the model increases the performance towards solving part of the NFBP, i.e. learning to respond to one relevant feature combination only. The performance increases with the amount of nonlinearity." This is not shown in Figure 6B.

      Sentence removed. We have added a Figure 6 - figure supplement 1 to better explain the limitations.

      P22, paragraph 1: The "w" parameter here is used to determine whether spatially localized synapses are co-active enough to generate a plateau potential. However, this is the same w learned through synaptic plasticity. Typically LTP and LTD are thought of as changing the number of postsynaptic AMPARs. Does this "w" also change the AMPAR weight in the model? Do the authors envision this as a presynaptic release probability quantity? If so, please state that and provide experimental justification. If not, please justify modifying the activation of postsynaptic NMDARs through plasticity.

      This is an important remark. Our plasticity model differs from classical LTP models as it depends on the link between LTP and increased spillover as described by Henneberger et al., (2020).

      We have updated the method section (page 27, lines 6-11), and we acknowledge, however, that in a real cell, learning might first strengthen the AMPA component, but after learning the ratio of NMDA/AMPA is unchanged ( Watt et al., 2004). This re-balancing between NMDA and AMPA might perhaps be a slower process.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      Thanks for the valuable feedback. We have now gone through the whole manuscript updating the text, and also improved figures and added some supplementary figures to better explain model mechanisms. In particular, we state more clearly our goal already in the introduction.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The model is quite detailed but builds on previous work. For this reason, for model components used in earlier published work (and where models are already available via model repositories, such as ModelDB), we refer the reader to these resources in order to improve readability and to highlight what is novel in this paper - the learning rules itself. The learning rule is now explained in detail. For modelers that want to run the model, we have also provided a GitHub link to the simulation code. We hope this is a reasonable compromise to all readers, i.e, those that only want to understand what is new here (learning rule) and those that also want to test the model code. We explain this to the readers at the beginning of the Methods section.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      Thanks for directing our attention to these oversights. We have gone through the entire manuscript, updating the figures where needed, and we are making sure that the text and the figure descriptions are clear and adequate and use consistent terminology for all quantities.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The “Metaplasticity” section (pages 30-32) has been updated to be more concise, and the abundant references to dopamine have been removed.

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      We have explicitly stated our goal in the introduction (page 4, lines 19-22). Please also see the response to reviewer 1.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      Thank you for your feedback. To address the concern regarding feature complexity, we extended our simulations to include learning with 9 and 25 features, achieving accuracies of 80% and 75%, respectively (Figure 6—figure supplement 1A). While our results demonstrate effective performance, the absence of external stabilizers—such as error-modulated functions used in prior studies like Bicknell and Hausser (2021)—means that the model's performance can be more sensitive to occasional incorrect outcomes. For instance, while accuracy might reach 90%, a few errors can significantly affect overall performance due to the lack of mechanisms to stabilize learning.

      In order to clarify the setup of the rule, we have added pseudocode in the revised manuscript (Pages 31-32) detailing how the learning rule and metaplasticity update synaptic weights based on calcium and dopamine signals. Additionally, we have included pseudocode for the inhibitory learning rule on Pages 34-35. In future work, we also aim to incorporate biologically plausible mechanisms, such as dopamine desensitization, to enhance stability.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      As now clearly stated in the introduction, the goal of the study is to see whether and to what quantitative extent the theoretical solution of the NFBP proposed in Tran-Van-Minh et al. (2015) can be achieved with biophysically detailed neuron models and with a biologically inspired learning rule. The problem has so far been solved with abstract and phenomenological neuron models (Schiess et al., 2014; Legenstein and Maass, 2011) and also with a detailed neuron model but with a precalculated voltage-dependent learning rule (Bicknell and Häusser, 2021).

      We have also tried to better explain the model mechanisms by adding supplementary figures.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The elevated [Ca²⁺]NMDA with minimal synaptic activation arises from high spine input resistance, small spine volume, and NMDA receptor conductance, which scales calcium influx with synaptic strength. Physiological studies report spine calcium transients typically up to ~1 μM (Franks and Sejnowski 2002, DOI: 10.1002/bies.10193), while our model shows ~7 μM for 0.625 nS and around ~3 μM for 0.5 nS, exceeding this range. The calcium levels of the model might therefore be somewhat high compared to biologically measured levels - however, this does not impact the learning rule, as the functional dynamics of the rule remain robust across calcium variations.

      (2) In the distributed synapses session, the study introduces two new mechanisms "Threshold spillover" and "Accumulative spillover". Both mechanisms are not basic concepts but quantitative descriptions of them are missing.

      Thank you for your feedback. Based on the recommendations from Reviewer 1, we have simplified the paper by removing the "Accumulative spillover" and focusing solely on the "Thresholded spillover" mechanism. In the updated version of the paper, we refer to it only as glutamate spillover. However, we acknowledge (page 22, lines 40-42) that to create sufficient non-linearities, other mechanisms, like structural plasticity, might also be involved (although testing this in the model will have to be postponed to future work).

      (3) The learning rule achieves moderate performance when feature-relevant synapses are organized in pre-designed clusters, but for more general distributed synaptic inputs, the model fails to faithfully solve the simple task (with its performance of ~ 75%). Performance results indicate the learning rule proposed, despite its delicate design, is still inefficient when the spatial distribution of synapses grows complex, which is often the case on biological neurons. Moreover, this inefficiency is not carefully analyzed in this paper (e.g. why the performance drops significantly and the possible computation mechanism underlying it).

      The drop in performance when using distributed inputs (to a mean performance of 80%) is similar to the mean performance in the same situation in Bicknell and Hausser (2021), see their Fig. 3C. The drop in performance is due to that: i) the relevant feature combinations are not often colocalized on the same dendrite so that they can be strengthened together, and ii) even if they are, there may not be enough synapses to trigger the supralinear response from the branch spillover mechanism, i.e. the inputs are not summated in a supralinear way (Fig. 6B, most input configurations only reach 75%).

      Because of this, at most one relevant feature combination can be learned. In the several cases when the random distribution of synapses is favorable for both relevant feature combinations to be learned, the NFBP is solved (Figs. 6B, some performance lines reach 100 % and 6C, example of such a case). We have extended the relevant sections of the paper trying to highlight the above mentioned mechanisms.

      Further, the theoretical results in Tran-Van-Minh et al. 2015 already show that to solve the NFBP with supralinear dendrites requires features to be pre-clustered in order to evoke the supralinear dendritic response, which would activate the soma. The same number of synapses distributed across the dendrites i) would not excite the soma as strongly, and ii) would summate in the soma as in a point neuron, i.e. no supralinear events can be activated, which are necessary to solve the NFBP. Hence, one doesn’t expect distributed synaptic inputs to solve the NFBP with any kind of learning rule. 

      (4) Figure 5B demonstrates that on average adding inhibitory synapses can enhance the learning capabilities to solve the NFBP for different pattern configurations (2, 3, or 4 features), but since the performance for excitatory-only setup varies greatly between different configurations (Figure 4B, using 2 or 3 features can solve while 4 cannot), can the results be more precise about whether adding inhibitory synapses can help improve the learning with 4 features?

      In response to the question, we added a panel to Figure 5B showing that without inhibitory synapses, 5 out of 13 configurations with four features successfully learn, while with inhibitory synapses, this improves to 7 out of 13. Figure 5—figure supplement 1B provides an explanation for this improvement: page 18 line 10-24

      (5) Also, in terms of the possible role of inhibitory plasticity in learning, as only on-site inhibition is studied here, can other types of inhibition be considered, like on-path or off-path? Do they have similar or different effects?

      This is an interesting suggestion for future work. We observed relevant dynamics in Figure 6A, where inhibitory synapses increased their weights on-site when randomly distributed. Previous work by Gidon and Segev (2012) examined the effects of different inhibitory types on NMDA clusters, highlighting the role of on-site and off-path inhibition in shunting. In our context, on-site inhibition in the same branch, appears more relevant for maintaining compartmentalized dendritic processing.

      (6) Figure 6A is mentioned in the context of excitatory-only setup, but it depicts the setup when both excitatory and inhibitory synapses are included, which is discussed later in the paper. A correction should be made to ensure consistency.

      We have updated the figure and the text in order to make it more clear that simulations are run both with and without inhibition in this context (page 21 line 4-13)

      (7) In the "Ca and kernel dynamics" plots (Fig 3,5), some of the kernel midlines (solid line) are overlapped by dots, e.g. the yellow line in Fig 3D, and some kernel midlines look like dots, which leads to confusion. Suggest to separate plots of Ca and kernel dynamics for clarity. 

      The design of the figures has been updated to improve the visibility of the calcium and kernel dynamics during training.

      (8) The formulations of the learning rule are not well-organized, and the naming of parameters is kind of confusing, e.g. T_min, T_max, which by default represent time, means "Ca concentration threshold" here.

      The abbreviations of the thresholds  ( T<sub>min</sub>,  T<sub>max</sub> in the initial version) have been updated to CTL and CTH, respectively, to better reflect their role in tracking calcium levels. The mathematical formulations have further been reorganized for better readability. The revised Methods section now follows a more structured flow, first explaining the learning mechanisms, followed by the equations and their dependencies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Thank you for your positive view of our paper and for your previous comments.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      We agree that these are limitations of the existing study. We updated the limitations section as follows (page 15, line 539):

      “Similarly, this study falls short in several potential mechanistic insights, such as by investigating citation appropriateness via text similarity or international dynamics in authors who move between countries.”

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

      Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

      Thank you for your comments. We have addressed your suggestions presented in the “Recommendations for the authors” section by performing your recommended sensitivity analysis that specifically identifies authors who could be considered neurologists, neuroscientists, and psychiatrists (as opposed to just papers that are published in these fields). Please see the “Recommendations for the authors” section for more details.

      Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Thank you for your comment and for highlighting this insightful paper. After reading this paper, we believe that our theoretical estimand is descriptive in nature. For example, in the abstract of our paper, we state: “This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020.” This goal seems consistent with the idea of a descriptive estimand, as we are not interested in any particular intervention or counterfactual at this stage. Instead, we seek to provide a broad characterization of subgroup differences in self-citations such that future work can ask more focused questions with causal estimands.

      Our analysis included subgroup means and generalized additive models, both of which were described as empirical estimands for a theoretical descriptive estimand in Lundberg et al. We added the following text to the paper (page 3, line 112):

      “Throughout this work, we characterized self-citation rates with descriptive, not causal, analyses. Our analyses included several theoretical estimands that are descriptive 17, such as the mean self-citation rates among published articles as a function of field, year, seniority, country, and gender. We adopted two forms of empirical estimands. First, we showed subgroup means in self-citation rates. We then developed smooth curves with generalized additive models (GAMs) to describe trends in self-citation rates across several variables.”

      In addition, we added to the limitations section as follows (page 15, line 539):

      “Yet, this study may lay the groundwork for future works to explore causal estimands.”

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Thank you for your previous comments. We agree that they improved the paper.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough revisions and responses to the reviews

      Reviewer #2 (Recommendations for the authors):

      I appreciate the authors' responses and am satisfied with all their replies except for my second comment. I still find the message conveyed slightly misleading, as the results seem to be generalized to neurologists, neuroscientists, and psychiatrists. It is important to refine the analysis to focus specifically on neuroscientists, identified as first or last authors based on their publication history. This approach is common in the science of science literature and would provide a more accurate representation of the findings specific to neuroscientists, avoiding the conflation with other related fields. This refinement could serve as a robustness check in the supplementary. I think adding this sub-analysis is essential to the validity of the results claimed in this paper.

      Thank you for your comment. We added a sensitivity analysis where fields are defined by an author’s publication history, not by the journal of each article.

      In the main text, we added the following:

      (Page 3, line 129) “When determining fields by each author’s publication history instead of the journal of each article, we observed similar rates of self-citation (Table S7). The 95% confidence intervals for each field definition overlapped in most cases, except for Last Author self-citation rates in Neuroscience (7.54% defined by journal vs. 8.32% defined by author) and Psychiatry (8.41% defined by journal vs. 7.92% defined by author).”

      Further details are provided in the methods section (page 21, line 801):

      “4.11 Journal-based vs. author-based field sensitivity analyses

      We refined our field-based analysis to focus only on authors who could be considered neuroscientists, neurologists, and psychiatrists. For each author, we looked at the number of articles they had in each subfield, as defined by Scopus. We considered 12 subfields that fell within Neurology, Neuroscience, and Psychiatry. These subfields are presented in Table S12. For each First Author and Last Author, we excluded them if any of their three most frequently published subfields did not include one of the 12 subfields of interest. If an author’s top three subfields included multiple broader fields (e.g., both Neuroscience and Psychiatry), then that author was categorized according to the field in which they published the most articles. Among First Authors, there were 86,220 remaining papers, split between 33,054 (38.33%) in Neurology, 23,216 (26.93%) in Neuroscience, and 29,950 (34.73%) in Psychiatry. Among Last Authors, there were 85,954 remaining papers, split between 31,793 (36.98%) in Neurology, 25,438 (29.59%) in Neuroscience, and 28,723 (33.42%) in Psychiatry.”

      Reviewer #3 (Recommendations for the authors):

      I would like to thank the authors for their responses the points that I raised, I do not have any new comments or further responses.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region of the operon. The authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, caused by the presence of some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because the presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. Even though the model is appealing and several of the experimental data support some aspects of it, several inconsistencies remain to be solved. In addition, even though TopAI was shown to be an inhibitor of topoisomerase I (Yamaguchi & Inouye, 2015, NAR 43:10387), the authors suggest, without offering any experimental support, that, because ribosome-targeting antibiotics act as inducers, expression of the topAI/yjhQ/yjhP operon may confer resistance to these drugs.

      Strengths:

      - There is good experimental support of the transcriptional repression/activation switch aspect of the model, derived from well-designed transcriptional reporters and ChIP-qPCR approaches.

      - There is a clever use of the topAI-lacZ reporter to find the 23S rRNA mutants where expression topAI was upregulated. This eventually led the authors to identify that translation events occurring at toiL are important to regulate the topAI/yjhQ/yjhP operon. Is there any published evidence that ribosomes with the identified mutations translate slowly (decreased fidelity does not necessarily mean slow translation, does it?)?

      G2253 is in helix 80 of the 23S rRNA, which has been proposed to be involved in correct positioning of the tRNA. Mutations in helix 80 have been reported to cause defects in peptidyl transferase center activity, which could reduce the rate of ribosome movement along the mRNA. If ribosomes are sufficiently slowed when translating toiL, this could induce expression of topAI. G1911 and Ψ1917 are in helix 69 of the 23S rRNA, which is involved in forming the inter-subunit bridge, as well as interactions with release factors. Mutations in helix 69 cause a decrease in the processivity of translation, suggesting that the mutations we identified may increase the occupancy of ribosomes within toiL, thereby inducing expression of topAI. We have added text to the Discussion section to include this speculation.

      - Authors incorporate relevant links to the antibiotic-mediated expression regulation of bacterial resistance genes. Authors can also mention the tryptophan-mediated ribosome stalling at the tnaC leader ORF that activates the expression of tryptophan metabolism genes through blockage of Rho-mediated transcriptional attenuation.

      We have added a citation to a recent structural study of ribosomes translating the tnaC uORF. Specifically, we speculate in the Discussion that toiL may have evolved to sense a ribosome-targeting antibiotic, or another ribosome-targeting small molecule such as an amino acid.

      Weaknesses:

      The main weaknesses of the work are related to several experimental results that are not consistent with the model, or related to a lack of data that needs to be included to support the model.

      The following are a few examples:

      - It is surprising that authors do not mention that several published Ribo-seq data from E. coli cells show active translation of toiL (for example Li et al., 2014, Cell 157: 624). Therefore, it is hard to reconcile with the model that starts codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression (Figure 2C, bar graphs of the no antibiotic control samples).

      These data are for a topAI-lux reporter construct rather than toiL-lux. In our model, ribosome stalling within toiL is required to induce expression of the downstream genes; preventing translation of toiL by mutating the start codon or Shine-Dalgarno sequence would not cause ribosome stalling, consistent with the lack of an effect on topAI expression.

      - The SHAPE reactivity data shown in Figure 5A are not consistent with the toiL ORF being translated. In addition, it is difficult to visualize the effect of tetracycline on mRNA conformation with the representation used in Figure 5B. It would be better to show SHAPE reactivity without/with Tet (as shown in panel A of the figure).

      We have modified this figure (now Figure 6) so that we no longer show the SHAPE-seq data +/- tetracycline overlayed on the predicted RNA structure, since at best, the predicted structure likely only represents uninduced state. We have included the predicted structure together with the SHAPE-seq data for untreated cells as a separate panel because it is part of the basis for our model. We have also added a supplementary figure showing a similar RNA structure prediction based on conservation of the topAI upstream region across species (Figure 6 – figure supplement 1), and we describe this in the text.

      - The "increased coverage" of topAI/yjhP/yjhQ in the presence of tetracycline from the Ribo-seq data shown in Figure 6A can be due to activation of translation, transcription, or both. For readers to know which of these possibilities apply, authors need to provide RNA-seq data and show the profiles of the topAI/yjhQ/yjhP genes in control/Tet-treated cells.

      A previous study (Li et al., 2014, PMID 24766808) compared RNA-seq and Ribo-seq data for E. coli to measure normalized ribosome occupancy for each gene. However, sequence coverage for topAI was too low to confidently quantify either the RNA-seq or the Ribo-seq data. Presumably RNA levels were low because of Rho termination. Hence, we were not confident that RNA-seq would provide information on the regulation of topAI-yjhQP. Other data in our study provide strong evidence that regulation is primarily at the level of translation. And the key conclusion from Figure 6 (now Figure 7) is that tetracycline stalls ribosomes on start codons.

      - Similarly, to support the data of increased ribosomal footprints at the toiL start codon in the presence of Tet (Figure 6B), authors should show the profile of the toiL gene from control and Tet-treated cells.

      Figure 6B shows data for both treated and untreated cells. The overall ribosome occupancy is much lower for untreated cells, making it difficult to draw strong conclusions about the relative distribution of ribosomes across toiL.

      - Representation of the mRNA structures in the model shown in Figure 5, does not help with visualizing 1) how ribosomes translate toiL since the ORF is trapped in double-stranded mRNA, and 2) how ribosome stalling on toiL would lead to the release of the initiation region of topAI to achieve expression activation.

      We now show the predicted structure with only SHAPE-seq data for untreated cells. The comparison of SHAPE-seq +/- tetracycline is shown without reference to the predicted structure.

      - The authors speculate that, because ribosome-targeting antibiotics act as expression inducers [by the way, authors should mention and comment that, more than a decade ago, it had been reported that kanamycin (PMID: 12736533) and gentamycin (PMID: 19013277) are inducers of topAI and yjhQ], the genes of the topAI/yjhQ/yjhP operon may confer resistance to these antibiotics. Such a suggestion can be experimentally checked by simply testing whether strains lacking these genes have increased sensitivity to the antibiotic inducers.

      We thank the reviewer for pointing out these references, which we now cite. The fact that another group found that gentamycin induces topAI expression – it is one of the most highly induced genes in that paper – strongly suggests that we missed the key inducing concentrations for one or more antibiotics, meaning that topAI is induced by even more ribosome-targeting antibiotics than we realized.

      We did some preliminary experiments to look for effects of TopAI, YjhQ, and/or YjhP on antibiotic sensitivity, but generated only negative results. Since these experiments were preliminary and far from exhaustive, we have chosen not to include them in the manuscript. Other studies of genes regulated by ribosome stalling in a uORF have looked at genes whose functions in responding to translation stress were already known, so the environmental triggers were more obvious. With so many possible triggers for topAI-yjhQP, it will likely require considerable effort to find the relevant trigger(s). Hence, we consider this an important question, but beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this important study, Baniulyte and Wade describe how the translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      I appreciate that the authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation. The results are convincing and clearly described.

      Weaknesses:

      I have relatively minor suggestions for improving the manuscript. These mainly relate to the figures.

      Reviewer #3 (Public Review):

      Summary:

      The authors nicely show that the translation and ribosome stalling within the ToiL uORF upstream of the co-transcribed topAI-yjhQ toxin-antitoxin genes unmask the topAI translational initiation site, thereby allowing ribosome loading and preventing premature Rho-dependent transcription termination in the topAI region. Although similar translational/transcriptional attenuation has been reported in other systems, the base pairing between the leader sequence and the repressed region by the long RNA looping is somehow unique in toiL-topAI-yjhQP. The experiments are solidly executed, and the manuscript is clear in most parts with areas that could be improved or better explained. The real impact of such a study is not easy to appreciate due to a lack of investigation on the physiological consequences of topAI-yjhQP activation upon antibiotic exposure (see details below).

      Strengths:

      Conclusion/model is supported by the integrated approaches consisting of genetics, in vivo SHAPE-seq and Ribo-Seq.

      Provide an elegant example of cis-acting regulatory peptides to a growing list of functional small proteins in bacterial proteomes.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Examine the consequences of mutations impeding translation of the topAI/yjhQ/yjhP operon on cell growth in the presence and absence of antibiotics.

      See response to Reviewer 1’s comment.

      (2) Resolve discrepancies between the SHAPE data indicating constitutive sequestration of the toiL Shine Dalgarno sequence with antibiotic-regulated translation of the toiL ORF.

      See response to Reviewer 1’s comment.

      (3) Reconcile published Ribo-Seq data with the model that start codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression in the absence of antibiotics.

      See response to Reviewer 1’s comment.

      (4) Clarify whether antibiotic MIC values were employed to select antibiotic concentrations for different experiments.

      The antibiotic concentrations we used are in line with reported MICs for E. coli. We now list the reported ECOFFs/MICs and include relevant citations.

      (5) Provide RNA-seq data to complement the Ribo-Seq data for the topAI/yjhQ/yjhP genes in control vs. Tet-treated cells.

      See response to Reviewer 1’s comment.

      (6) Revise the text to address as many of the reviewers' suggestions as reasonably possible.

      Changes to the text have been made as indicated in the responses to the reviewers’ comments.

      Reviewer #2 (Recommendations for the Authors):

      (1) Page 6: I would have liked to have more information about the 39 suppressor mutations in rho. Do any of the cis-acting mutations give support for the model proposed in Figure 8?

      We only know the specific mutation for some of the strains, and we now list those mutations in the Methods section. For other mutants, we mapped the mutation to either the rho gene or to Rho activity, but we did not sequence the rho gene. Most of the specific mutations we did identify fall within the primary RNA-binding site of Rho and hence should be considered partial-loss-of-function mutations (complete loss of function would be lethal).

      We identified cis-acting mutations by re-transforming the lacZ reporter plasmid into a wild-type strain. We did not sequence any of these plasmids.

      (2) Page 12-13, Section entitled "Mapping ribosome stalling sites induced by different antibiotics": This section should start with a better transition regarding the logic of why the experiments were carried out and should end with an interpretation of the results.

      We have added a few sentences at the start of this section to explain the rationale. We have also added two sentences at the end of this section to summarize the interpretation of the data.

      (3) Page 15: The authors should discuss under what conditions the expression of TopAI (and YjhQ/YjhP might be induced? Is expression also elevated upon amino acid starvation?

      We have looked through public RNA-seq data but have not identified growth conditions other than antibiotic treatment that induce expression of topAI, yjhQ or yjhP.

      (4) References: The authors should be consistent about capitalization, italics, and abbreviations in the references.

      These formatting errors will be fixed in the proofing stage.

      (5) All graph figures: There should be more uniformity in the sizes of individual data points (some are almost impossible to see) and error bars across the figures.

      We have tried to make the data points and error bars more visible for figures where they were smaller.

      (6) Figure 1B: I do not think the left arrow labeling is very intuitive and suggest renaming these constructs.

      We have removed the arrows to improve clarity.

      (7) Figure 2A: toiL should be introduced at the first mention of Figure 2A.

      We have added a schematic of the topAI-yjhQ-yjhP region as Figure 1A, including the toiL ORF, which we briefly mention in the text. We have opted to split Figure 2C into two panels. In Figure 2C we now only show data for the wild-type construct. Data for the mutant constructs are now shown in a new figure (Figure 5), alongside data for the wild-type constructs. We have simplified Figure 2A, since the mutations are not relevant to this revised figure, and we now show the schematic with the mutations as Figure 5A.

      (8) Figure 3C and 3D: I suggest giving these graphs headings (or changing the color of the bars in Figure 3D) to make it more obvious that different things are measured in the two panels.

      We have added headers to panels B-D make it clear that which graphs show ChIP-qPCR data which graph shows qRT-PCR data.

      (9) Figure 6: It might be nice to show the topAI-yjhPQ operon here.

      We now show the operon in Figure 1A.

      (10) Figure 8: This figure could be optimized by adding 5' and 3' end labels and having more similarity with the model in Figure 7.

      The constructs shown in Figure 7 lack most of the topAI upstream region, so they aren’t readily comparable to the schematic in Figure 8. However, we have changed the color of the ribosome in Figure 7 to match that in Figure 8. We also indicate the 5’ end of the RNA in Figure 8.

      Reviewer #3 (Recommendations for the Authors):

      Areas to improve:

      (1) While it's important to learn about ToiL-dependent regulation of the downstream topAI-yjhQ toxin-antitoxin genes, the physiological consequence of topAI-yjhQ activation seems to be lost in the manuscript. Everything was done with a reporter lacZ/lux. In the absence of toiL translation (i.e. SD mutant) and/or ribosome stalling, does premature transcription termination result in non-stochiometric synthesis of toxin vs. antitoxin, leading to growth arrest or other measurable phenotype? Knowing the impact of ToiL in the native topAI-yjhQ context will be valuable.

      See response to Reviewer 1’s comment.

      (2) It was indicated in Figure 4-figure supplement 1 that toiL homologs are found in many other proteobacteria, are the UR sequences in those species also form a similar inhibitory RNA loop?? The nt sequence identity of toiL is likely to be constrained by the base pairing of the topAI 5' region.

      We have added a supplementary figure panel showing an RNA structure prediction for the topAI upstream region based on sequence alignment of homologous regions from other species (Figure 6 – figure supplement 1).

      What is the frequency of the MLENVII hepta-peptide in the E. coli genome-wide. Is the sequence disfavored to avoid spurious multi-antibiotic sensing?

      LENVII is not found in any annotated E. coli K-12 protein. However, this is a sufficiently long sequence that we would expect few to no instances in the E. coli proteome.

      (3) Figure 1A, it would be helpful to indicate the location of the toiL (red arrow as in Figure 2A) relative to the putative rut site early in the beginning of the results. Does TSS mark the transcription start site? There is no annotation of TSS in the figure legend. Was TSS previously mapped experimentally? Please include relevant citations.

      We now indicate the position of the TSS relative to the topAI start codon. Similarly, we indicate the position of the start of toiL relative to the topAI start codon in Figure 2A. We now explain “TSS” in the figure legend. There is a reference in the text for the TSS (Thomason et al., 2015).

      (4) Please consider rearranging the results section, perhaps more helpful to introduce the toiL in Figure 1 or earlier. The current format requires readers to switch back-and-forth between Figure 4 and Figure 2.

      We have added a schematic of the topAI upstream region as Figure 1A, and we have separated Figure 2C as described in a response to a comment from Reviewer 2.

      (5) Figure 2A and Figure 2-Figure Suppl 1A, for clarity, please mark the rut site upstream of the red arrow.

      Rather than mark the rut on Figure 2A, which would make for a busy schematic, readers can compare the positions of the rut to those of toiL, which we have now added to Figures 1B (formerly Figure 1A) and 2A.

      (6) The following conclusion seems speculative: "...but does not trigger termination until RNAP ..., >180 nt further downstream…". Shouldn't the authors already know where the termination site is based on their previous Term-seq data (see Ref 1, Adams PP et al 2021)?

      Sites of Rho-dependent transcription termination cannot be mapped precisely from Term-seq data because exoribonucleases rapidly process the unstructured RNA 3’ ends.

      (7) Genetic screen: Please discuss why the 23S rRNA mutations that cause translational infidelity could promote topAI translation. Wouldn't the mutant ribosome be affected in translating toiL?

      See response to Reviewer 1’s comment.

      (8) Although antibiotic concentrations were provided in Figure 2 legend, please provide the MIC values of each antibiotic, e.g., in Table S2, for the tested E. coli strain, to inform readers how specific subinhibitory concentrations were chosen.

      See response to Reviewing Editor.

      (9) Please clarify the calculation of luciferase units in the y-axis of Figure 2A, why the scale is drastically higher than that of Figure 7C using the same antibiotics?

      These reporter assays use different constructs. The reporter construct used for experiments in Figure 7 includes a portion of the ermCL gene and associated downstream sequence. We have enlarged Figure 7A to highlight the difference in reporter constructs.

      (10) Table S4 needs a few more details. It is unclear how those numbers in columns G-H were generated. Do those numbers correspond to ribosome density per nt/ORF?

      We have added footnotes to Table S4 to indicate that the numbers in columns G and H represent sequence read coverage normalized by region length and by the upper quartile of gene expression.

      (11) Figure 5, if the SHAPE results were true, the Shine Dalgarno sequence of toiL is sequestered in the hairpin structure with and without tetracycline treatment. It is inconceivable that translational initiation will occur efficiently, please discuss.

      Our representation of the SHAPE-seq data was confusing since we overlayed the SHAPE-seq changes on a predicted structure that likely corresponds to the uninduced state. We hope that the new version of Figure 5 is clearer.

      We presume the reviewer is referring to the Shine-Dalgarno sequence of topAI rather than toiL, since the Shine-Dalgarno sequence of toiL is predicted to be unstructured even in the absence of tetracycline treatment. The ribosome-binding site of topAI is more accessible in cells treated with tetracycline, although the SHAPE-seq data suggest that this is a transient event. The binding of the initiating ribosome may also reduce reactivity in this region under inducing conditions. We now discuss this briefly in the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors repeatedly assert that an individual's behavior in the foraging assay depends on its prior history (particularly cultivation conditions). While this seems like a reasonable expectation, it is not fully fleshed out. The work would benefit from studies in which animals are raised on more or less abundant food before the behavioral task.

      Cultivation density: While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is an interesting experiment, it is not feasible at this time. We previously attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. Thus, we focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction (lines 618-624).

      (2) The authors convincingly show that the probability of particular behavioral outcomes occurring upon patch encounter depends on time-associated parameters (time since last patch encounter, time since last patch exploitation). There are two concerns here. First, it is not clear how these values are initialized - i.e., what values are used for the first occurrence of each behavioral state? More importantly, the authors don't seem to consider the simplest time parameter, the time since the start of the assay (or time since worm transfer). Transferring animals to a new environment can be associated with significant mechanical stimulus, and it seems quite possible that transferring animals causes them to enter a state of arousal. This arousal, which certainly could alter sensory function or decision-making, would likely decay with time. It would be interesting to know how well the model performs using time since assay starts as the only time-dependent parameter.

      Parameter Initialization: We thank the reviewer for pointing out an oversight in our methods section regarding the model parameter values used for the first encounter. We clarified the initialization of parameters in the manuscript (lines 1162-1179). In short, for the first patch encounter where k = 1:

      ρ<sub>k</sub> is the relative density of the first patch.

      τ<sub>s</sub> is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρ<sub>h</sub> is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD<sub>600</sub> = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρ<sub>e</sub> is equivalent to ρ<sub>h</sub>.

      Transfer Method: We thank the reviewer for their thoughtful comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We anticipated this possibility and, in order to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observed no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick. We added these additional methodological details to the methods (lines 791-796).

      Time Parameter: However, the reviewer’s concern that the simplest time parameter (time since start of the assay) might better predict animal behavior is valid. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. To test this hypothesis, we ran our model with varying combinations of the satiety term τ<sub>s</sub> and a transfer term τ<sub>t</sub>. We found that when both terms were included in the model, the coefficient of the transfer term was non-significant. This result suggests that the relevant time-dependent term is more likely related to satiety than transfer-induced stress (lines 343-358; Figure 4 - supplement 4D).

      (3) Similarly, Figures 2L and M clearly show that the probability of a search event occurring upon a patch encounter decreases markedly with time. Because search events are interpreted as a failure to detect a patch, this implies that the detection of (dilute) patches becomes more efficient with time. It would be useful for the authors to consider this possibility as well as potential explanations, which might be related to the point above.

      Time-dependent changes in sensing: We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we added this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect (lines 563-568).

      (4) Based on their results with mec-4 and osm-6 mutants, the authors assert that chemosensation, rather than mechanosensation, likely accounts for animals' ability to measure patch density. This argument is not well-supported: mec-4 is required only for the function of the six non-ciliated light-touch neurons (AVM, PVM, ALML/R, PLML/R). In contrast, osm-6 is expected to disrupt the function of the ciliated dopaminergic mechanosensory neurons CEP, ADE, and PDE, which have previously been shown to detect the presence of bacteria (Sawin et al 2000). Thus, the paper's results are entirely consistent with an important role of mechanosensation in detecting bacterial abundance. Along these lines, it would be useful for the authors to speculate on why osm-6 mutants are more, rather than less, likely to "accept" when encountering a patch.

      Sensory mutant behavior: We thank the reviewer for pointing out the error in our interpretation of the behavior of osm-6 and mec-4 animals. We further elaborated on our findings and edited the text to better reflect that osm-6 mutants lack both chemosensory and mechanosensory ciliated sensory neurons (lines 406-448; lines 567-577). Specifically, we provided some commentary on the finding that osm-6 mutants show an augmented ability to detect the presence of bacterial patches but a reduced ability to assess their bacterial density. While this finding seems contradictory, it suggests that in the absence of the ability to assess bacterial density, animals must prioritize exploiting food resources when available.

      (5) While the evidence for the accept-reject framework is strong, it would be useful for the authors to provide a bit more discussion about the null hypothesis and associated expectations. In other words, what would worm behavior in this assay look like if animals were not able to make accept-reject decisions, relying only on exploit-explore decisions that depend on modulation of food-leaving probability?

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      Reviewer #3 (Public review):

      (1) Sensing vs. non-sensing

      The authors claim that when animals encounter dilute food patches, they do not sense them, as evidenced by the shallow deceleration that occurs when animals encounter these patches. This seems ethologically inaccurate. There is a critical difference between not sensing a stimulus, and not reacting to it. Animals sense numerous stimuli from their environment, but often only behaviorally respond to a fraction of them, depending on their attention and arousal state. With regard to C. elegans, it is well-established that their amphid chemosensory neurons are capable of detecting very dilute concentrations of odors. In addition, the authors provide evidence that osm-6 animals have altered exploit behaviors, further supporting the importance of amphid chemosensory neurons in this behavior.

      Interpretation of “non-sensing” encounters: We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 8A-C and Patch encounter classification as sensing or non-responding in Methods). Regardless, we agree with the reviewer that all that can be asserted about these events is that animals do not appear to respond to the bacterial patch in any way that we measured. Therefore, we have replaced the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events and clarified the text to reflect this change (lines 193-200; lines 211-212).

      (2) Search vs. sample & sensing vs. non-sensing

      In Figures 2H and 2I, the authors claim that there are three behavioral states based on quantifying average velocity, encounter duration, and acceleration, but I only see three. Based on density distributions alone, there really only seem to be 2 distributions, not 3. The authors claim there are three, but to come to this conclusion, they used a QDA, which inherently is based on the authors training the model to detect three states based on prior annotations. Did the authors perform a model test, such as the Bayesian Information Criterion, to confirm whether 2 vs. 3 Gaussians is statistically significant? It seems like the authors are trying to impose two states on a phenomenon with a broad distribution. This seems very similar to the results observed for roaming vs. dwelling experiments, which again, are essentially two behavioral states.

      Validation of sensing clusters: We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the semi-supervised QDA approach. We added additional visualizations and methods to validate the clusters we have discovered. Specifically, we used Silverman’s test to show that the sensing vs. non-responding data were bi-modal (i.e. a two-cluster classification method fits best) and accompanied this statistical test with heat maps which better illustrate the clusters (lines 171-173; lines 190-191; lines 948-972; lines 1003-1005; Figure 2 - supplement 6A-C; Figure 2 - supplement 7C-F).

      Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit). It’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (now changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-responding exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing).

      (4) History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seems odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      Model design: We thank the reviewer for their thoughtful comments on the model. We completed a number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets) and found that the problem of model selection was compounded by the enormous array of highly-correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design are therefore beyond the scope of this study.

      Lastly, in regards to the use of only sensed patches in the model; while we acknowledge that we are not certain as to whether the “non-responding” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events. We have added additional commentary about our model to the discussion section (lines 667-695).

      (5) osm-6

      The osm-6 results are interesting. This seems to indicate that the worms are still sensing the food, but are unable to assess quality, therefore the default response is to exploit. How do you think the worms are sensing the food? Clearly, they sense it, but without the amphid sensory neurons, and not mechanosensation. Perhaps feeding is important? Could you speculate on this?

      We thank the reviewer for their thoughtful remarks. We have added additional commentary about the result of our sensory mutant experiments as described above in response to Reviewer #1 under Sensory mutant behavior.

      (7) Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors title the work as an "ethological study" and emphasize the theme of "foraging in naturalistic environments" in contrast to typical laboratory conditions. The only difference in this study relative to typical laboratory conditions is that the food bacteria is distributed in many small patches as compared to one large patch. First, it is not clear to the reviewer that the size of the food patches in these experiments is more relevant to C. elegans in its natural context than the standard sizes of food patches. Furthermore, all the other highly unnatural conditions typical of laboratory cultivation still apply: the use of a 2D agar substrate, a single food bacteria that is not a component of a naturalistic diet, and the use of a laboratory-adapted strain of C. elegans with behavior quite distinct from that of natural isolates. The reviewer is not suggesting that the authors need to make their experiments more naturalistic, only that the experiments as described here should not be described as naturalistic or ethological as there is no support for such claims.

      Ethological interpretation: We thank the reviewer for their comments about the use of the term ethological to describe this study. We chose to develop a patchy bacterial assay to mimic the naturalistic “boom-or-bust” environment. While we agree with the reviewer that we do not know if the size and distribution of the food patches in these experiments is more relevant to C. elegans, we maintain that these experiments were ecologically-inspired and revealed behavior that is difficult to observe in environments with large, densely-seeded bacterial patches. We have updated our text to better reflect that this study was “ecologically-inspired” rather than truly “ethological” in nature (lines 94, 693).

      The main finding of the paper is that worms explore and then exploit, i.e. they frequently reject several bacterial patches before accepting one. This result requires additional scrutiny to reject other possible interpretations. In particular, when worms are transferred to a new plate we would expect some period of increased arousal due to the stressful handling process. A high arousal state might cause rejection of food patches. Could the measured accept/reject decisions be influenced by this effect? One approach to addressing this concern would be to allow the animals to acclimate to the new plate on a bare region before encountering the new food patches.

      We thank the reviewer for their comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We addressed this above in response to Reviewer #1 under Transfer Method and Time Parameter. In brief, we used a worm picking method that mitigated stress and added additional analyses showing that a transfer-related term was less predictive than a satiety-related term.

      Related to the above, in what circumstances exactly are the authors claiming that worms first explore and then exploit? After being briefly deprived of food? After being handled?

      Explore-then-exploit: All animals were well-fed and handled gently as described above under Transfer Method (lines 787-795). Our results suggest that the appearance of an explore-then-exploit strategy is a byproduct of being transferred from an environment with high bacterial density to an environment with low bacterial density as described in the manuscript (lines 461-466).

      The authors emphasize their analysis of the accept/reject decision as a critical innovation. However, the accept/reject decision does not strike me as substantially different from the previously described stay/switch decision. When a worm encounters a new patch of bacteria, accepting this bacteria is equivalent to staying on it and rejecting (leaving) it is equivalent to switching away from it. The authors should explain how these concepts are significantly distinct.

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      During patch encounter classification, the authors computed three of the animals' behavioral metrics (Line 801-804) and claimed that the combination of these three metrics reveals two non-Gaussian clusters representing encounters where animals sensed the patch or did not appear to sense the patch. The authors also refer to a video to demonstrate the two clusters by rotating the 3-dimension scatter plot. However, the supposed clusters, if any, are difficult to see in a 3D (Video 5) or in a 2D scatter plot (Figure 3I). The authors need to clearly demonstrate the distinct clustering as claimed in the paper as this feature is fundamental and necessary for the model implementation and interpretation of results.

      We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters. We added additional visualizations and methods to validate the clusters we have discovered as described in our above response to Reviewer #3 under Validation of sensing clusters.

      When selecting parameters (covariates) for their model, it is critical to avoid overfitting. Therefore, the authors used AIC and BIC (Figure 4- supplement 1) to demonstrate that the full GLM model has a better model performance than the other models which contain only a subset of the full covariates (in a total of 5). However, the authors compare the full set with only 4 other models whereas the total number of models that need to be compared with is 2^5-2. The authors at least need to include the AIC and BIC scores of all possible models in order to draw the conclusion about the performance of the full model.

      Model selection criterion: We thank the reviewer for pointing out this gap in our methodology. We have now run the model with all combinations of subsets of model parameters and have confirmed that the model with all 5 covariates outperforms all other models even when using BIC, the strictest criterion for overfitting (Figure 1 - supplement 1A). The only other model that performs well (though not as often as the 5-term model) is the 4-term model lacking ρ<sub>h</sub>. This result is not surprising as ρ<sub>h</sub> only changes substantially once in an animal’s encounter history for the single-density, multi-patch data that this model was fit to. For example, for an animal foraging on patches of density 10, on the first encounter ρ<sub>h</sub> = ~200 (see Parameter initialization above), but on every subsequent encounter ρ<sub>h</sub> = ~10. Resultantly, the effect of ρ<sub>h</sub> on the probability of exploiting is somewhat binary on the single-density, multi-patch data set. Nevertheless, we see significantly improved prediction of behavior in the novel multi-density, multi-patch data (Figure 4F) as we observe an effect of the most recently encountered patch. Additionally, we observe a similar impact (i.e., significant coefficient of negative sign) of the ρ<sub>h</sub> term when the model is fit to the multi-density, multi-patch data set (Figure 4 - supplement 4D).

      In any bacterial patch, the edges have a higher density of bacteria than the patch center. Thus, it is possible that a worm scans the patch edge density, on the basis of which it decides to accept or reject the patch whose average density is smaller. This could potentially cause an underestimate of the bacteria density used in the model. Furthermore, the potential inhomogeneity of the patch may further complicate the worm's decision-making, and the discrepancy between the reality and the model assumption will reduce the validity of the model. The authors need to estimate the inhomogeneity of the bacterial patches used in their assays and discuss how the edge effects may affect their results and conclusions.

      Bacterial patch inhomogeneity: We extensively tested the landscape of the bacterial patches by imaging fluorescently-labeled bacteria OP50-GFP (Bacterial Patch Density in Methods; Figure 2 - supplement 1-3). As the reviewer mentions, we observe significantly greater bacterial density at the patch edge. This within-patch spatial inhomogeneity results from areas of active proliferation of bacteria and likely complicates an animal’s ability to accurately assess the quantity of bacteria within a patch and, consequently, our ability to accurately compute a metric related to our assumptions of what the animal is sensing. In our study, we used the relative density of the patch edge where bacterial density is highest as a proxy for an animal’s assessment of bacterial patch density (Figure 2 – supplement 1). This decision was based on a previous finding that the time spent on the edge of a bacterial patch affected the dynamics of subsequent area-restricted search. While within-patch spatial inhomogeneity likely affects an animal’s ability to assess patch density, we do not believe that this qualitatively affects the results of our study. Both the patch densities tested (Figure 2 – supplement 3A) as well as our observations of time-dependent changes in exploitation (Figure 2E,N-O; Figure 3H-I) maintained a monotonic relationship. Therefore, alternative methods of patch density estimation should yield similar results. We have added additional discussion on this topic to our manuscript (lines 578-593).

      The authors claim that their methods (GMM and semi-supervised QDA) are unbiased. This seems unlikely as the QDA involves supervision. The authors need to provide additional explanation on this point.

      Semi-supervised QDA labelling: We have removed the term “unbiased” to avoid any misinterpretation of the methodology and clarified our method of labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there was no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD<sub>600</sub> = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification (lines 1012-1021). See Figure 2 - supplement 8A-B for a visualization showing the labelled data.

      Based on the authors' result, worms behaviorally exhibit their preferences toward food abundance (density), which results in a preference scale for a range of densities. Does this scale vary with the worms' initial cultivation states? The author partially verified that by observing starved worms. This hypothesis could be better tested if the authors could analyze the decision-making of the worms that were initially cultivated with different densities of bacterial food.

      While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is a very interesting experiment, it is not feasible at this time. We focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction as described above in our response to Reviewer #1 under Cultivation density.

      It would be helpful to elaborate more on how the framework developed in this paper can be applied more broadly to other behaviors and/or organisms and how it may influence our understanding of decision-making across species.

      We thank the reviewer for alerting us to this gap in our discussion. We have added additional commentary about our model and its utility to the discussion section (lines 667-695).

      Reviewer #3 (Recommendations for the authors):

      Sensing vs. non-sensing

      Perhaps a more ethologically accurate term to describe this behavior would be "ignoring" rather than "not sensing". If the authors feel strongly about using the term "not sensing", then they should provide experimental evidence supporting this claim. However, I think simply changing the terminology negates these experiments.

      We thank the reviewer for their thoughtful comments. While we agree with the reviewer that the term “non-sensing” may not be ethologically accurate (see response to Public Review above under Interpretation of “non-sensing” encounters), we interpret the term “ignoring” to mean that the animal sensed the patches but decided not to react. We have chosen to replace the term “non-sensing” with “non-responding” to best indicate the ethological interpretation of our observation. Nonetheless, we believe that it remains possible that animals are truly not sensing the bacterial patches as our method of classification compared the behavior against encounters with patches lacking bacteria (as described above in response to Reviewer #2 under Semi-supervised QDA labelling).

      History-dependence of the GLM

      Perhaps a simpler approach would be to say the worm senses everything, and this accumulative memory affects the decision to exploit. For example, the animal essentially experiences two feeding states: feeding on patches, and starvation off of patches.

      The level of satiety could be modeled linearly:

      Satiety(t_enter:t_leave) = k_feed*patch_density*delta_t

      Where k_feed is some model parameter for rate of satiety signal accumulation, t_enter is the time the animal entered the patch, t_leave is the time the animal left the patch, and delta_t is the difference between the two. Perhaps you could add a saturation limit to this, but given your data, I doubt that is the case.

      Starvation could be modeled as simply a decay from the last satiety signal:

      Starvation(t_leave:t_enter) = Satiety(t_leave)*exp(-k_starve*delta_t).

      Where starvation is the rate constant for the decay of the satiety signal.

      For the logistic model, the logistic parameter is simply the difference between the current patch density and the current satiety signal.

      A nice thing about this approach is that it negates the need to categorize your patches. All patch encounters matter. Brief patch encounters (categorized as non-sensing and not used in the prior GLM) naturally produce a very small satiety signal and contribute very little to the exploit decision. Another nice thing about this approach is that it gives you memory timescales, that are testable. There is a rate of satiety accumulation and a rate of satiety loss. You should be able to predict behavior with lower patch density, assuming the rate constants hold. (I am not advocating you do more experiments here, just pointing out a nice feature of this approach).

      You could possibly apply this to a GLM for velocity on a non-exploited patch as well, though I assume this would be a linear GLM, given the velocity distributions you provided.

      We thank the reviewer for their time and thoughtfulness in thinking about our model. The reviewer’s proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making. However, we decided to keep our paper focused on using a minimal model to answer a set of core questions (e.g., Does encounter history or satiety influence decision-making?) (see above under Model design for a more detailed response). Future studies investigating the mechanisms of these foraging decisions should open the door for more mechanistically accurate models. We have expanded our discussion of the model to include this assertion (lines 667-695).

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sample size: If the sample size of the study is increased, more confidence and new insights can be inferred about myometrial enhancer-mediated gene regulation in term pregnancy. Such a small sample size (N = 3) limits the statistical power of the study. As mentioned in the manuscript they failed to identify chromatin loops in the second subject's biopsy is observed due to a limited sample.

      We agree with the reviewer’s comment about the sample size. We sincerely hope the result of this study would increase the interest of stakeholders to fund future projects in a larger scale.

      (2) Figure quality: There is a lack of good representations of the results (e.g., screenshots of tables as figure panels!) as well as missing interpretations that might add value to the manuscript.

      Figure 1B and 2B have been converted to the pie chart format.

      (3) Definition of super-enhancer: The definition of super-enhancer is not clear. Also, the computational merging of enhancers to define super-enhancers should be described better.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”:

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (4) Assay-Specific Limitations: Each assay employed in the study, such as ChIP-Seq and CRISPRa-based Perturb-Seq, has its limitations, including potential biases, sensitivity issues, and technical challenges, which could impact the accuracy and reliability of the results. These limitations should be addressed properly to avoid false-positive results and improve the interpretability of the results.

      The major limitations of the CRISPRa-based Perturb-Seq protocol in this study are the use of the hTERT-HM cells and the two-vector system for transduction. While hTERT-HM cells are a much easier platform in terms of technical operation, primary human myometrial cells are generally considered retaining a molecular context that is closer to the in vivo tissues. Due to the limitation on the efficiency of having two vectors simultaneously present in the same cell, hTERT-HM cells are much more affordable and operationally feasible to conduct the experiment. Future advancements on the increase of viral vector payload capacity may overcome this challenge and open the venue to perform the assay on primary human myometrial cells.

      (5) Sample collection and comparison: There is mention of matched gravid term and non-gravid samples whereas no description or use of control samples was found in the results. Also, the comparison of non-labor samples with labor samples would provide a better understanding of epigenomic and transcriptomic events of myometrium leading to laboring events.

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Lack of clarity:

      (6a) It is written as 'Chromatin Conformation Capture (Hi-C)'. I think Hi-C is Histone Capture and 3C is Chromosome Conformation Capture! This needs clear writing.

      As the reviewer suggested, to make it clear, we have changed the text “A high throughput chromatin conformation capture (Hi-C) assay” to “A High-throughput Chromosome Conformation Capture (Hi-C) assay”.

      (6b) In multiple places, 'PLCL2' gene is written as 'PCLC2'.

      Corrected as suggested.

      (6c) What is the biological relevance of considering 'active' genes with FPKM {greater than or equal to} 1? This needs clarification.

      In RNA-seq analysis, the gene expression levels are often quantified using FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Setting a threshold of FPKM for defining "active" genes in RNA-seq analysis is biologically relevant, because it helps to distinguish between genuinely expressed genes and background noise. It helps researchers focus on genes, which are more likely to have a significant biological impact. A common threshold for defining "active" genes is FPKM ≥ 1. Genes with FPKM values below this threshold may be transcribed at very low levels or could be background noise.

      (6d) The understanding of differentially methylated genes at promoters is underrated as per the authors. But, why leaving DNA methylation apart, they selected histone modification as the basis of epigenetic reprogramming in terms of myometrium is unclear.

      DNA methylation indeed plays a crucial role in evaluating the impact of cis-acting elements on gene regulation. Large-scale studies, such as the comprehensive analysis of the myometrial methylome landscape in human biopsies (Paul et al., JCI Insight, 2022, PMID: 36066972), have provided valuable insights. When integrated with histone modification and chromatin looping data, contributed by our group and collaborators, future secondary analyses leveraging machine learning are poised to further elucidate the mechanisms underlying myometrial transcriptional regulation.

      (6e) How does the identification of PGR as an upstream regulator of PLCL2 gene expression in human myometrial cells contribute to our understanding of progesterone signaling in myometrial function?

      In a previous study, we demonstrated a positive correlation between PLCL2 and PGR expression in a mouse model and identified PLCL2's role in negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., PNAS, 2021, PMID: 33707208). The present study builds on this by providing evidence for a direct regulatory mechanism in which PGR influences PLCL2 transcription, likely through a cis-acting element located 35 kb upstream. These findings suggest that PLCL2 acts as a mediator of PGR-dependent myometrial quiescence prior to labor, rather than merely participating in a parallel pathway. Further in vivo studies are necessary to delineate the extent to which PLCL2 mediates PGR activity, particularly the contraction-dampening function of the PGR-B isoform.

      (7) Grammatical error: The manuscript has numerous grammatical errors. Please correct them.

      Corrections have been made as suggested.

      (8) Use of single-cell data: Though from the Methods section, it can be understood that single-cell RNA-seq was done to identify CRISPRa gRNA expressing cells to characterize the effect of gene activation, some results from single-cell data e.g., cell clustering, cell types, gRNA expression across clusters could be added for better elucidation.

      As reviewer suggested, we have prepared a file “PerturbSeq_summary.xlsx” (Dataset S9) to provide additional results of perturb-seq data analysis. It includes 2 spreadsheets, “Cell_per_gRNA” for clustering and “Protospacer_calls_per_cell” for gRNA expression across clusters.

      Reviewer #2 (Recommendations For The Authors):

      (1) The following are a number of grammatical issues in the abstract. I suggest having a careful read of the entire manuscript to identify additional grammatical issues as I may not be able to highlight all of these issues.

      (1a) "The myometrium plays a critical component during pregnancy." change component to role.

      (1b) "It is responsible for the uterus' structural integrity and force generation at term," à replace "," with "."

      (1c) Also, I suggest rephrasing the first 2 sentences to: The myometrium plays a critical role during pregnancy as it is responsible for both the structural integrity of the uterus and force generation at term.

      (1d) "Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping." Remove "the", and modify to "Here we investigated human term pregnant".

      (1e) Missing period and sentence fragment, "PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Corrections have been made as suggested.

      (2) Sentence fragment: Studies on the role of steroid hormone receptors in myometrial remodeling have provided evidence that the withdrawal of functional progesterone signaling at term is due to a stoichiometric increase of progesterone receptor (PGR) A to B isoform-related estrogen receptor (ESR) alpha expression activation at term. (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).

      The statement has been updated:

      “Studies on the role of steroid hormone receptors in myometrial remodeling suggest that the withdrawal of functional progesterone signaling at term results from a stoichiometric shift favoring the PGR-A isoform over PGR-B. This shift is associated with increased activation of estrogen receptor alpha (ESR1) expression at term (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).”

      (3) FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as Cx43 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993).

      Use Gja1 (Gap junction alpha 1) as the current correct gene, not Cx43.

      Also, several references predate Nadeem, Farine et al. 2018 and are more appropriate to use as references for the role of Ap-1 proteins in regulating Gja1; PMID: 15618352 and PMID: 12064606 were the first to show this relationship in myometrial cells.

      The statement has been updated as suggested:

      “FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as GJA1 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993)”

      (4) Define PLCL2 on first use.

      Updated as suggested.

      (5) There are a number of issues with this section, "Matched sSpecimens of gravid myometrium were collected at the margin of hysterotomy from women undergoing clinically indicated cesarean section at term (>38 weeks estimated gestation age) without evidence of labor. Specimens of healthy, non-gravid myometrium were also pecimens were collected from uteri removed from pre-menopausal women undergoing hysterectomy for benign clinical indications."

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 (Heinz, Benner et al. 2010).

      Please clarify what background is used for motif enrichment.

      We used the default background sequences generated by HOMER from a set of random genomic sequences matching the input sequences in terms of basic properties, such as GC content and length. We have added more details in the Method section:

      “DNA-binding factor motif enrichment analysis

      Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 with default background sequences matching the input sequences (Heinz, Benner et al. 2010).”

      (7) "Six of the seven regions are also co-localized with previously published genome occupancy of transcription regulators curated by the ReMap Atlas"

      Please clarify if this Atlas includes myometrial tissues or not and clarify the cell types included in the atlas.

      According to the UCSC Genome Browser and the reference by Hammal et al. (2022), the current ReMap database includes PGR ChIP-seq data from human myometrial biopsies, available under NCBI GEO accession number GSE137550, alongside data from various other cell and tissue types. ReMap provides valuable insights into potential functional cis-acting elements in the genome from a systems biology perspective. However, tissue specificity requires independent validation.

      (8) "Notably, 76% of the putative super-enhancers are co-localized with known PGR-occupied regions in the human myometrial tissue (Figure S2). This is significantly higher than the 20% co-localization in the regular enhancer group (Figure S2)."

      Because there is a huge difference in the size of the putative super enhancer regions and the isolated enhancers this comparison is not appropriate as conducted. The comparison needs to account for the difference in size of the regions. Please provide P values for significance statements.

      We acknowledge the reviewer's concern that our initial statement was overstated and potentially misleading, given the substantial difference in size between putative super-enhancer regions and regular enhancers. Rather than emphasizing the enrichment, it would be more accurate to simply describe our observation that super-enhancers encompass more PGR-occupied regions.

      Here is the updated version:

      “Notably, 76% of the putative super-enhancers co-localize with known PGR-occupied regions in human myometrial tissue, compared to 20% co-localization observed in regular enhancers (Figure S2).”

      Reviewer #3 (Recommendations For The Authors):

      (1) Title is extremely misleading, as here we do not get a view of the epigenomic landscape, but rather sparce data related to H3K27ac and H3K4me (focusing on enhancers) and chromatin conformation associated with the PLCL2 transcription start site (TSS).

      As suggested, the title is modified to “Assessment of the Histone Mark-based Epigenomic Landscape in Human Myometrium at Term Pregnancy”.

      (2) Improve the first result paragraph by providing a clear rationale for the experiments and their objectives, as well as introducing the samples used. Rather than simply listing approaches and end results in Table 1, offer concise explanations for the experiments alongside the supporting data presented in detailed figures. Using appropriate figures/graphs to effectively contextualize these datasets would be greatly appreciated by readers and would add more value to this research. Currently, it is difficult for us to assess and appreciate the quality of the data.

      The following statement is included in the beginning of the Result section:

      "To better understand the regulatory network shaping the myometrial transcriptome before labor, we analyzed transcriptome and putative enhancers in individual human myometrial specimens. Using RNA-seq, we identified actively expressed RNAs, while ChIP-seq for H3K27ac and H3K4me1 was used to map putative enhancers. Active genes were associated with nearby putative enhancers based on their genomic proximity. Additionally, chromatin looping patterns were mapped using Hi-C to further link active genes and putative enhancers within the same chromatin loops."

      (3) The statistics for every sequencing approach need to be provided for each sample (e.g., RNA-seq: number of total reads, number of mapped reads, % of mapped reads; ChIP-Seq: number of mapped reads, % of mapped reads, % of duplicates).

      We have generated the summary table of each dataset included in this study (Dataset S7) [NGS-summary.xls].

      (4) Figure S1: The rationale behind comparing the Dotts study and yours regarding H3K27ac-positive regions needs to be better defined. Why is this performed if the data will not be used afterwards? What are the conserved regions associated with vs the ones that are variable? Is this biologically relevant? Why not use only the regions conserved between the 6 samples, to have more robust conclusions?

      The purpose of comparing our data with the Dotts dataset is to highlight the degree of variation across studies. In this study, we focused on addressing specific biological questions using our own dataset rather than developing methodologies for meta-analysis. Future advancements in meta-analysis techniques could leverage the combined power of multiple datasets to provide deeper insights.

      (5) Perhaps due to a lack of details, I am unable to ascertain how the putative myometrial enhancers were defined. In Dataset S1, it is stated, "we define the regions that have overlapping H3K27ac and H3K4me1 marks as putative myometrial enhancers at the term pregnant nonlabor stage (Dataset S1)". Within Dataset S1, for subjects 1, 2, and 3, H3K27ac and H3K4me1 double-positive enhancers are shown in term pregnant, non-labor human myometrial specimens, with approximately 100 regions corresponding to 131 (sample 1), 127 (sample 2), and 140 (sample 3) common peaks. However, in Figure 1a, reference is made to the 13114 putative enhancers commonly present across the three specimens. Is Dataset S1 intended to represent only a small fraction of the 13114 putative enhancers? Detailed analyses need to be conducted and better showcased.

      Dataset S1 has been updated to list all 13,114 putative enhancers.

      (6) For the gene expression analyses of RNA-seq data, FPKM values were utilized. However, it is unclear why the gene expression count matrix was normalized based on the ratio of total mapped read pairs in each sample to 56.5 million for the term myometrial specimens. I would recommend exercising caution regarding the use of FPKM expression units, as samples are normalized only within themselves, lacking cross-sample normalization. Consequently, due to external factors unaccounted for by this normalization method, a value of 10 in one sample may not equate to 10 in another.

      We value the reviewer’s input. This question will be addressed in future secondary data analyses with suitable methodologies, as it is beyond the scope of this study.

      (7) In Figure 1b, the authors have categorized their 12157 active genes into 3 bins based on FPKM values: >5 FPKM >1, >15 FPKM >5, and >15 FPKM. However, in the text, they describe these as 'actively high-expressing genes (FPKM >= 15)'. I would advise caution regarding the interpretation of these values, as an FPKM of 15 is not typically associated with highly expressed genes. According to literature and resources such as the Expression Atlas, an FPKM of 15 is generally considered to represent a low to medium expression level.

      We appreciate the reviewer’s feedback. This question will be revisited during secondary data analyses using appropriate methodologies, as it falls outside the scope of the present study.

      To increase readability and clarity, we modified the sentence as following: More than 40% of the 540 putative super enhancers are located within a 100-kilobase distance to high-expressing genes (FPKM >= 15), while only 7.3% of putative myometrial super enhancers are found near low-expressing genes (5 > FPKM >=1) (Figure 2B).

      (8) Out of the 12157 active genes, approximately two-thirds have an FPKM >15. Was this expected? How does this correspond to what is observed in the literature, particularly in other similar studies (https://pubmed.ncbi.nlm.nih.gov/30988671/ ; https://pubmed.ncbi.nlm.nih.gov/35260533/ ) .

      This is indeed an intriguing question that merits further exploration in future secondary analyses.

      (9) It is also surprising to see that for the motif enrichment analysis (Fig. 1C), the P-values are small. This is probably because the percentage of target sequences with the motif is very similar to the percentage of background sequences with the motif. For instance, for selected genes in Figure 1C: AP-1 (50.68% vs. 46.50%), STAT5 (28.08% vs. 25.04%), PGR (17.90% vs. 16.12%), etc. Can one really say that you have a biologically relevant enrichment for values that are so close between target sequences and background sequences?

      Reviewer’s comment is noted. Biological relevance shall be experimentally examined though wet-lab assays in future studies.

      (10) For Figure 2, again not convinced that FPKM >= 15 can be used to say: Compared with the regular putative enhancers, the putative myometrial super-enhancers are found more frequently near active genes that are expressed at relatively higher levels (Figure 1B and Figure 2B). A higher threshold should be used if they want to say this.

      To compare the association of putative enhancers with active genes expressed at different levels, we categorized the active genes into three groups based on their FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values. These groups are defined as follows: the top third active genes (FPKM ≥ 15), the middle third active genes (5 ≤ FPKM < 15), and the bottom third active genes (1 ≤ FPKM < 5). By "active genes expressed at relatively higher levels," we refer specifically to the top third active genes with FPKM values of 15 or higher, indicating their relatively higher expression levels compared to the other groups of active genes.

      (11) More detailed explanations and methods are needed regarding how the data for Figure S2 was obtained.

      The following details were added to the methods section:

      “Colocalization of super enhancers and PGR genome occupancy was compared by calling peaks from previously published PGR ChIP-seq data (GSM4081683 and GSM4081684). The percentages of enhancers and super enhancers that manifest PGR occupancy were calculated by overlapping the genomic regions in each category with PGR occupancy regions.”

      (12) In Figure 2C, there is no information provided on the genes used to obtain the results. It would be helpful to include examples of these genes, along with their expression values, for instance.

      The expression levels of the 346 active genes that are associated with myometrial super enhancers are included in Dataset S4, along with results of the updated gene ontology enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) of Knowledgebase v2024q4. Selected pathways of interest are listed in updated Figure 2C.

      (13) The linking of PLCL2-related data to the first part of the story is lacking, and the rationale behind it is missing. This entire section should be more detailed, and the data should be expanded to better reflect the context.

      As suggested, we included the following statement at the beginning of the section “Cis-acting elements for the control of the contractile gene PLCL2”:

      “We previously demonstrated the positive correlation of PLCL2 and PGR expression in a mouse model and PLCL2’s function on negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., 2021). However, the mechanism underlies the PGR regulation of PLCL2 remains unclear. Taking advantage of the mapped myometrial cis-acting elements, we aimed to identify the cis-acting elements that may contribute to the PLCL2 transcriptional regulation with a special interest on the PGR-related enhancers.”

      The context is that our results provide additional evidence to support a direct regulation mechanism of PGR on the PLCL2 transcription, likely though the 35-kb upstream cis-acting element. This finding suggests that PLCL2 likely plays a mediator’s role of PGR dependent myometrial quiescence before laboring rather than a mere passenger on a parallel pathway. Further studies using in vivo models are needed to determine the extent of PLCL2 in mediating PGR, especially PGR-B isoform’s contraction-dampening function.

      (14) The entire Hi-C data should be presented to allow for the assessment of its quality and further value.

      The revised manuscript has included the Hi-C quality control summary in Dataset S8 [HiC-QC-Summary.xlsx].

      (15) The authors state: "For the purpose of functional screening, we focus on H3K27ac signals instead of using H3K27ac/H3K4me1 double positive criterium to cast a wider net." However, it is unclear how many of the targeted regions contained H3K27ac/H3K4me1 peaks. Were enhancers or super-enhancers targeted, and if so, how did they compare to H3K27ac sites?

      The numbers of H3K27ac/H3K4me1 double positive peaks are recorded in Figure 1A. Compared to the numbers of H3K27ac intervals (Table 1), the H3K27ac/H3K4me1 double positive peaks are 62.9%, 70.7%, and 61.2% of corresponding H3K27ac intervals in each individual specimen.

      (16) For the first set of data (Table 1), the authors state, "Together, these results reveal an epigenomic landscape in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition." While it is acknowledged that an epigenetic landscape exists in all tissues, there is a lack of clarity regarding this landscape in the current manuscript, as we are only presented with a table containing numbers.

      This sentence has been revised to: “Together, these results delineate a map of H3K27ac and H3K4me1 positive signals in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition.”

      (17) For S1, the authors conclude: These data together highlight the degree of variation in mapping the epigenome among specimens and datasets. This conclusion seems somewhat perplexing, and I find myself in partial disagreement. Firstly, providing a clear rationale for this section would strengthen the conclusions. It's important to consider what factors may contribute to this variability. It could simply be attributed to differences in experimental settings, such as variations in samples, protocols used, antibodies, sequencing departments, or overall data quality. Deeper analyses of the data could have provided more information.

      We agree with the reviewer that deeper analyses are needed in order to extract more information among studies. However, appropriate methods for meta-analyses should be carefully evaluated and employed for this purpose. We humbly believe that such a task should belong to future studies that may combine available datasets for secondary analyses, leveraging the collective contribution of the reproductive biology community.

      (18) In the methods section, please include an explanation of how enhancers and super-enhancers were defined or add appropriate citations for reference.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”.

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (19) Additional description on the "Inferred myometrial PGR activities and the correlation analysis "method section should be included to enhance clarity and understanding.

      The description has been updated:

      “The inferred PGR activities were represented by the T-score, which was derived by inputting the mouse myometrial Pgr gene signature, based on the differentially expressed genes between control and myometrial Pgr knockout groups at mid-pregnancy (Wu, Wang et al., 2022), into the SEMIPs application (Li, Bushel et al., 2021). The T-scores were computed using this signature alongside the normalized gene expression counts (FPKM) from 43 human myometrial biopsy specimens.”

      (20) How was the qPCR analysis performed? Was the ddCT method utilized, and was a reference gene used for control? Additional information would be beneficial.

      Quantifying relative mRNA levels was performed via the standard curve method.

      The following details were added: “Relative levels of genes of interest were normalized to the 18S rRNA.”

      (21) Regarding the RNA-Seq analysis of Provera-treated human Myometrial Specimens, the continued use of FPKM is not ideal due to potential differences in RNA composition between libraries. Additionally, clarification is needed on why Cufflinks 2.0.2 was used, considering it is no longer supported.

      FPKM (Fragments Per Kilobase of transcript per Million mapped reads) is used in RNA-Seq analysis, because it allows for the normalization of gene expression data, accounting for differences in gene length and sequencing depth, and facilitates comparability across different genes and libraries. This makes it one of the essential tools for accurately measuring and comparing gene expression levels in various biological and clinical research contexts.

      CuffLinks was once a popular tool for analyzing RNA-seq data, transcriptome assembly, and DEG identification. Its usage has declined in recent years due to the emergence of newer and more advanced tools. The main reason is that it was used for RNA-seq analysis at early stage of this study a few years ago. For the purpose of comparison and consistency, we continued using this tool for later RNA-seq analysis. If we start a new project now, we will choose newer tools, such as HISAT2, Salmon, and DEseq2.

      (22) Overall, sentence structure and typos need to be corrected across the text. Here are some examples:

      Line 17: at term, emerging studies.

      Line 20-22: Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping.

      Line 30-32: PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Line 66-70: However, the role of differential myometrial DNA methylation at contractility-driving gene promoter CpG islands in preterm birth is not thought to be major (Mitsuya, Singh et al. 2014), but given that DNA methylation-mediated gene regulation often occurs outside of CpG islands (Irizarry, Ladd-Acosta et al. 2009), there is still work to be done at this interface.

      Line 80-83: Putative enhancers upstream of the PLCL2, a gene encoding for the protein PLCL2 which has been implicated in the modulation of calcium signaling (Uji, Matsuda et al. 2002) and maintenance of myometrial quiescence (Peavey, Wu et al. 2021), transcriptional start site were subject to functional assessment using CRISPR activation based assays.

      Line 290 : sSpecimens

      We appreciate the reviewer’s kind efforts and have made changes accordingly.

    1. Public Reviews: Reviewer #1 (Public Review): Summary: A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception. Strengths: The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception. We thank the reviewer for their positive comments. Weaknesses: "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI). We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found: Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness. Related comment for the following excerpts: "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99). "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190). "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247). It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results. We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity. We added the following sentence to the discussion: Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness. "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117). Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined. This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads: When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods). During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation. The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception? These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus. "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216). It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified. We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects. The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis? Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough: For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths). Therefore, we added the following sentence in Figures 2, 3, 4 and S3. [...] for patients for which we could obtain anatomical images. Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception? We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript. We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section: We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods: To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component. We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above): The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021). Reviewer #2 (Public Review): The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit. We thank the reviewer for these positive comments. Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration: (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task. This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.” In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation): Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness. (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field. We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked. (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider. We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes. We added a sentence to this effect in the discussion. Reviewer #3 (Public Review): Summary: This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection. Strengths: There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure. We thank the reviewer for this positive evaluation. Weaknesses: Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials. The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3. I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories. We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below. Recommendations for the authors: Reviewer #1 (Recommendations For The Authors): It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C). We added the following figure to the supplementary information. The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated). Thank you for spotting this, it has been corrected. Reviewer #2 (Recommendations For The Authors): We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions: Thank you for these suggestions that we believe improved our interpretations. Major Points (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures. We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section: We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods: To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component. We also updated the discussion: The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021). (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions. We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room. (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example. We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified. (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections. Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. Minor Comments (1) Add the shared overlap between differently selective neurons explicitly in the manuscript. We added this information at the end of the results section. (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare? Sorry for this misunderstanding. We clarified this in revised methods: To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance. Reviewer #3 (Recommendations For The Authors): Suggestions for improved or additional experiments, data or analysis: As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories. We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5. Recommendations for improving the writing and presentation As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example: Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else? We agree that this was somewhat unclear. We modified the sentence to: permuting the sign of the trial-wise differences We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext. Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information. We agree that the methods section was unclear. We re-wrote the following two paragraphs: To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants. For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster. Minor points : Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot. We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4. Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset." We made the change. Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles” Changed. Line 329: remove double “when” We made the change, thank you for spotting this.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception.

      Strengths:

      The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception.

      We thank the reviewer for their positive comments.

      Weaknesses:

      "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI).

      We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found:

      Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness.

      Related comment for the following excerpts:

      "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99).

      "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190).

      "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247).

      It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results.

      We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity.

      We added the following sentence to the discussion:

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117).

      Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined.

      This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads:

      When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods).

      During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation.

      The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception?

      These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022).

      Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus.

      "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216).

      It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified.

      We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects.

      The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis?

      Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough:

      For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths).

      Therefore, we added the following sentence in Figures 2, 3, 4 and S3.

      [...] for patients for which we could obtain anatomical images.

      Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception?

      We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript.

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above):

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      Reviewer #2 (Public Review):

      The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit.

      We thank the reviewer for these positive comments.

      Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration:

      (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task.

      This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.”

      In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation):

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field.

      We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked.

      (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider.

      We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes.

      We added a sentence to this effect in the discussion.

      Reviewer #3 (Public Review):

      Summary:

      This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection.

      Strengths:

      There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure.

      We thank the reviewer for this positive evaluation.

      Weaknesses:

      Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials.

      The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3.

      I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories.

      We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. 

      We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C).

      We added the following figure to the supplementary information.

      The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated).

      Thank you for spotting this, it has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions:

      Thank you for these suggestions that we believe improved our interpretations.

      Major Points

      (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures.

      We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We also updated the discussion:

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions.

      We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room.

      (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example.

      We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified.

      (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections.

      Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. 

      Minor Comments

      (1) Add the shared overlap between differently selective neurons explicitly in the manuscript.

      We added this information at the end of the results section.

      (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare?

      Sorry for this misunderstanding. We clarified this in revised methods:

      To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analysis:

      As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories.

      We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5.

      Recommendations for improving the writing and presentation

      As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example:

      Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else?

      We agree that this was somewhat unclear. We modified the sentence to:

      permuting the sign of the trial-wise differences

      We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext.

      Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information.

      We agree that the methods section was unclear. We re-wrote the following two paragraphs:

      To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants.

      For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster.

      Minor points:

      Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot.

      We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4.

      Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset."

      We made the change.

      Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles”

      Changed.

      Line 329: remove double “when”

      We made the change, thank you for spotting this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.