10,000 Matching Annotations
  1. Aug 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      Seon and Chung's study investigates the hypothesis that individuals take more risks when observed by others because they perceive others to be riskier than themselves. To test this, the authors designed an innovative experimental paradigm where participants were informed that their decisions would be observed by a "risky" player and a "safe" player. Participants underwent fMRI scanning during the task. 

      Strengths: 

      The research question is sound, and the experimental paradigm is well-suited to address the hypothesis. 

      Weaknesses:

      I have several concerns. Most notably, the manuscript is difficult to read in parts, and I suggest a thorough revision of the writing for clarity, as some sections are nearly incomprehensible. Additionally, key statistical details are missing, and I have reservations about the choice of ROIs.

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the current revision, we have revised the manuscript for clarity and added previously omitted statistical details. Furthermore, in the response letter, we have also provided additional explanations to clarify our approach, including the rationale for the choice and use of ROIs.

      Reviewer #2 (Public review): 

      Summary: 

      This study aims to investigate how social observation influences risky decision-making. Using a gambling task, the study explored how participants adjusted their risk-taking behavior when they believed their decisions were being observed by either a risk-averse or risk-seeking partner. The authors hypothesized that individuals would simulate the choices of their observers based on learned preferences and integrate these simulated choices into their own decision-making. In addition to behavioral experiments, the study employed computational modeling to formalize decision processes and fMRI to identify the neural underpinnings of risky decision-making under social observation. 

      Strengths: 

      The study provides a fresh perspective on social influence in decision-making, moving beyond the simple notion that social observation leads to uniformly riskier behavior. Instead, it shows that individuals adjust their choices depending on their beliefs about the observer's risk preferences, offering a more nuanced understanding of how social contexts shape decision-making. The authors provide evidence using comprehensive approaches, including behavioral data based on a well-designed task, computational modeling, and neuroimaging. The three models are well selected to compare at which level (e.g., computing utility, risk preference shift, and choice probability) the social influence alters one's risky decision-making. This approach allows for a more precise understanding of the cognitive processes underlying decision-making under social observation. 

      Weaknesses: 

      While the neuroimaging results are generally consistent with the behavioral and computational findings, the strength of the neural evidence could be improved. The authors' claims about the involvement of the TPJ and mPFC in integrating social information are plausible, but further analysis, such as model comparisons at the neuroimaging level, is needed to decisively rule out alternative interpretations that other computational models suggest. 

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the current revision, we have included neural results from additional analyses, which we believe provide stronger support for our proposed computational model.

      Reviewer #3 (Public review): 

      Summary: 

      This is an important paper using a novel paradigm to examine how observation affects the social contagion of risk preferences. There is a lot of interest in the field about the mechanisms of social influence, and adding in the factor of whether observation also influences these contagion effects is intriguing.

      Strengths:

      (1) There is an impressive combination of a multi-stage behavioural task with computational modelling and neuroimaging.

      (2) The analyses are well conducted and the sample size is reasonable. 

      Weaknesses: 

      (1) Anatomically it would be helpful to more explicitly distinguish between dmPFC and vmPFC. Particularly at the end of the introduction when mPFC and vmPFC are distinguished, as the vmPFC is in the mPFC. 

      (2) The authors' definition of ROIs could be elaborated on further. They suggest that peaks are selected from neurosynth for different terms, but were there not multiple peaks identified within a functional or anatomical brain area? This section could be strengthened by confirming with anatomical ROIs where available, such as the atlases here http://www.rbmars.dds.nl/lab/CBPatlases.html and the Harvard-Oxford atlases. 

      (3) How did the authors ensure there were enough trials to generate a reliable BOLD signal? The scanned part of the study seems relatively short. 

      (4) It would be helpful to add whether any brain areas survived whole-brain correction. 

      (5) There is a concern that mediation cannot be used to make causal inferences and much larger samples are needed to support claims of mediation. The authors should change the term mediation in order to not imply causality (they could talk about indirect effects instead) and highlight that the mediation analyses are exploratory as they would not be sufficiently powered (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843527/). 

      (6) The authors may want to speculate on lifespan differences in this susceptibility to risk preferences given recent evidence that older adults are relatively more susceptible to impulsive social influence (Zhu et al, 2024, comms psychology). 

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the response letter below, we address each of the reviewer’s comments, including clarifications regarding the ROIs and the limitations of the current study in interpreting the results.

      Reviewer #1 (Recommendations for the authors):

      (1) The neuroimaging hypotheses seem post hoc to me. First, the term "social inference" is used very loosely. In line 103 the authors mentioned that TPJ has been reported to be involved in inferring other's intentions and learning about others. However, in their task, it is not clear where inference is needed. All participants need to do is recall others' "preferences", rather than inferring a hidden variable or hidden intention. In addition, in some of the studies that the authors have cited (e.g., Park et al. 2021), the hippocampus is the focus of the inference, which gets no mention here.

      How does solving this task require inference (as defined by the authors: inferring others' intentions)? And why do they choose TPJ while inference is not needed in this task?

      We regret any confusion and would like to take this chance to clarify our hypothesis on social inference. As the reviewer pointed out, participants were indeed instructed to predict their choices, through which we expected them to learn the demonstrators’ preferences. Our computational model suggests that during the main phase of the task, i.e., the Observed phase, participants simulated others’ choices based on these previously learned risk preferences of others. The gamble choices they encountered (payoffs and associated probabilities) did not overlap with those in the Learning phase, and therefore, we expected that the cognitive process triggered by the social context involved active simulation—what we describe as making inference about others—rather than simple ‘recall’ of previously learned information. In line with this reasoning, we hypothesized that the TPJ, a brain region previously implicated in simulating others’ actions and intentions, would play a key role during the Observed phase.

      Regarding the role of the hippocampus, the paper we cited by BoKyung Park et al. (2021), titled “The role of right temporoparietal junction in processing social prediction error across relationship contexts”, highlights the involvement of the rTPJ but does not mention the hippocampus. We are aware of the study by Seongmin A. Park et al. (2021), “Inferences on a multidimensional social hierarchy use a grid-like code”, which shows the involvement of the hippocampus and entorhinal cortex in making inferences about multidimensional social hierarchies; we believe the reviewer may have mistakenly assumed that we cited this article. As the study showed, the involvement of the hippocampus—and the use of its grid-like representation of social information—is likely tied to the multidimensional nature of task states. In our study, the hippocampus was not included as an ROI because we had no specific rationale to hypothesize that such grid-like representations would be recruited by our task.

      (2) Social influence can be motivated informationally (to improve accuracy) or normatively (to be aligned with others). To me, it seems that the authors have studied the latter, because, first, there is no objectively correct response in this task and second, because participants changed their risk preference according to the preference of the observing partner. This distinction has not been made throughout the manuscript. This is important because the two process (information and normative) are supported by different neural processes and it is extremely useful to understand neural basis of which process the authors are studying.

      We thank the reviewer for the opportunity to clarify the anticipated role of social influence in our study. As the reviewer pointed out, the gambling task used in our task does not have objectively correct or incorrect answers, and naturally, any social influence present during the task would align with normative social influence. To clarify this point, we have revised the discussion section as follows:

      [Page 9, Line 345]

      Observational learning and mimicry of others’ behavior are patterns commonly found in social animals, including nonhuman primates (Van de Waal et al., 2013). Such behaviors are thought to be driven either by a motivation to acquire additional information (‘informational conformity’) or by a motivation to align with group norm (‘normative conformity’), even when doing so does not necessarily lead to better outcomes (e.g., higher accuracy) (Cialdini & Goldstein, 2004). Given that there are no objectively correct or incorrect answers in the gambling task used in our study, the observed social influence is more consistent with normative conformity. However, we cannot rule out the possibility that individuals developed false beliefs about a particular observing partner—namely, that the partner had greater control over or insight into the gambling task. Future studies are needed to directly investigate whether individuals’ beliefs about others modulate informational social influence—that is, their motivation to use social information to gain additional insight by inferring others’ potential choices.

      (3) From Line 160 onward, the authors report several findings without providing any effect sizes or statistics. Please add effect size and statistics for each finding.

      We thank the reviewer for pointing this out. We have now added the corresponding effect sizes and statistical values for the reported findings, beginning from Line 160 in the revised manuscript.

      (4) Line 270: "In particular, bilateral TPJ, brain regions not implicated in the Solo phase, positively tracked trial-by-trial model-estimated decision probabilities". How can the authors conclude that TPJ is not involved in the solo phase? As far as I understood from the text, TPJ was not included as one of the ROIs for analysis of the Solo phase. If it was included, it should be mentioned in the text and there should be a direct comparison between the effect sizes of the solo and the observer phase. If not, "not implicated in the Solo phase" is not justified and should be removed.

      We apologize for the confusion. As the reviewer correctly pointed out, the TPJ was not included among the ROIs in our analysis of the Solo phase data; therefore, its involvement during the Solo phase was never directly assessed using an ROI-based approach.

      To examine brain responses during the Observed phase, we first assessed whether regions that tracked decision probabilities during the Solo phase—vmPFC, vStr, and dACC—were also engaged in the Observed phase. The involvement of the TPJ during the Observed phase was revealed through a subsequent whole-brain analysis. To clarify this point, we now have revised the corresponding part as follows:

      [Page 8, Line 276]

      In particular, bilateral TPJ positively, brain regions not implicated in the Solo phase, tracked trial-by-trial model-estimated decision probabilities

      à Notably, bilateral TPJ showed significant positive tracking of decision probabilities ~

      (5) I am a bit puzzled about the PPI analysis. Is the main finding increased connectivity within mPFC in the observing condition? PPI is often done between two separate brain regions. I am not sure what it means that connectivity within mPFC increases in one condition compared to another. What was the motivation for this analysis? Can you also please explain what it means?

      As the reviewer noted, psychophysiological interaction (PPI) analyses examine functional connectivity between brain regions as modulated by a psychological factor. To clarify our result, the reported ‘mPFC-mPFC connectivity’ refers to functional connectivity between the mPFC region responsive to the presence of an observing partner and an adjacent, anatomically distinct region within the mPFC. Note that we have revised the manuscript to refer to this region more specifically as the dorsomedial prefrontal cortex (dmPFC). Please see our response to Reviewer 3, Comment 1, for further details.

      During the Observed phase of our task, social information was processed at two distinct time points. First, at the beginning of each decision trial, individuals were cued with the presence (or absence) of an observing partner (‘Partner presentation’). Second, the gamble options, as well as the observing partner’s identity, were revealed (‘Options revealed’). Because participants had previously learned about the observing partner’s risk preferences, we expected them to simulate the choice the partner would likely make. We hypothesized that if individuals indeed simulated the partner’s choice and incorporated this information into their decision-making process, the brain region involved in recognizing the partner’s presence (dmPFC<sub>contrast</sub>) would be functionally connected to the region responsible for integrating social information into the final decision (TPJ). Our results showed that the two regions were functionally connected via an indirect path through an anatomically adjacent cluster within the mPFC (dmPFC<sub>PPI</sub>). Given that the recognition of the partner’s presence and the simulation of their choice occurred at two distinct time points, we interpreted the functional connectivity between the two dmPFC clusters (dmPFC<sub>contrast</sub> and dmPFC<sub>PPI</sub>) as evidence that the dmPFC<sub>PPI</sub>) remained engaged during the decision process to support simulation, rather than being involved solely in the passive recognition of the social context (i.e., observed vs not observed). Note that, consistent with this interpretation, functional connectivity was stronger in individuals who showed greater reliance on social information ('Social reliance' parameter in our model).

      To avoid confusion, we have now labeled the two dmPFC clusters as dmPFC<sub>contrast</sub>—the seed region identified at partner presentation—and dmPFC<sub>PPI</sub>—the target region identified in the PPI analysis.

      [Page 8, Line 284]

      This cue was intended to dissociate neural responses to the social context per se (i.e., the presence of an observing partner), which we hypothesized would initiate social processing, from the neural processes involved in incorporating this information during the subsequent decision-making phase.

      [Page 8, Line 291]

      We tested whether the dmPFC was also involved in incorporating social information during the decision process under social observation, particularly among individuals who relied more heavily on simulating others’ behavior.

      [Page 8, Line 297]

      We confirmed that the functional connectivity between the dmPFC<sub>contrast</sub> which is sensitive to cues regarding the presence of an observing partner, and its adjacent, anatomically distinct region within the dmPFC (‘dmPFC<sub>PPI</sub>’ hereafter; x = 3, y = 50, z = 5, k<sub>E</sub> = .74, cluster-level P<sub>FWE, SVC</sub> = 0.011; Fig. 4a, b, Table S5) was positively associated with individuals’ social reliance.

      (6) In Line 107 the authors say "excitatory stimulation of the TPJ improved social cognition". Improved social cognition is too general and unspecific. Please be more specific.

      We agree that the term ‘social cognition’ was too general and unspecific. In the revised manuscript, we have specified that the improvement was observed in tasks specifically involving the control of self-other representation, as demonstrated by Santiesteban et al. (2012).

      [Page 4, Line 106]

      Corroborating with these neuroimaging data, excitatory stimulation of the TPJ improved social cognition (Santiesteban et al., 2012),~

      à Corroborating these neuroimaging findings, excitatory stimulation of the TPJ improved social cognition involving the control of self-other representation (Santiesteban et al., 2012),~

      Writing:

      We thank the reviewer for their thorough evaluation of our manuscript. We have now made the necessary revisions in accordance with the provided comments.

      (7) Line 75: "one risky options" should be one risky option.

      [Page 3, Line 74]

      between one safe (i.e., guaranteed payoff) and one risky options.

      between a safe option (i.e., guaranteed payoff) and a risky option.

      (8) Line 82: were given with the same set of gamble should be "were given the same set of gamble".

      [Page 3, Line 81]

      In the third phase (‘Observed phase’), individuals were given with the same set of gamble choices they faced in the Solo phase,

      In the third phase (‘Observed phase’), individuals were given the same set of gamble choices they faced in the Solo phase,~

      (9) Line 63: and that the extent of such influence depends on the identity of the observer. It is not clear what the authors mean by the "identity of observer". Does it mean the preference of the observer?

      Van Hoorn et al. (2018) showed that the degree of social influence varies depending on whether individuals are being observed by parents or by peers. While one might attribute this difference to divergent preferences typically held by parents and peers, it is important to note that other factors may also differ between these social groups. To avoid overinterpretation while preserving the original meaning, we have revised the sentence as follows:

      [Page 3, Line 61]

      However, recent studies showed that the unidirectional influence of social others’ presence may be also observed in adults (Otterbring, 2021), and that the extent of such influence depends on the identity of the observer (Van Hoorn et al., 2018).  

      However, recent studies showed that the unidirectional influence of social others’ presence can also be observed in adults (Otterbring, 2021), and that the extent of this influence depends on the observer’s identity—specifically, whether the observer is a parent or a peer (Van Hoorn et al., 2018).

      (10) Line 103: "including inferring others' intention and in learning about others." An "in" is missing right before inferring.

      [Page 4, Line 101]

      The temporoparietal junction (TPJ) is another region known to play an important role in social cognitive functions, including inferring others’ intention and in learning about others (Behrens et al., 2008; Boorman et al., 2013; Charpentier et al., 2020; Park et al., 2021; Samson et al., 2004; Saxe & Kanwisher, 2003; Saxe & Kanwisher, 2013; Van Overwalle, 2009; Young et al., 2010).

      The temporoparietal junction (TPJ) is another region known to play an important role in a range of social cognitive functions, including simulating others’ intention and choices, as well as learning about others (Behrens et al., 2008; Boorman et al., 2013; Charpentier et al., 2020; Park et al., 2021; Samson et al., 2004; Saxe & Kanwisher, 2003; Saxe & Kanwisher, 2013; Van Overwalle, 2009; Young et al., 2010).

      (11) 106: "Corroborating with these neuroimaging data." It should be "corroborating these neuroimaging data".

      [Page 4, Line 106]

      Corroborating with these neuroimaging data, ~

      Corroborating these neuroimaging findings, ~

      (12) Lines 113-115. It is not clear what the authors are trying to say here.

      We have now revised the sentence as follows:

      [Page 4, Line 112]

      We hypothesized that even if others’ choices are not explicitly presented, simple presence of social others may trigger inference about others’ potential choices, and the same set of brain regions will play an important role in value-based decision-making.

      We hypothesized that, even in the absence of explicit information about others’ choices, the mere presence of social others could lead participants to conform to the option they believe others would choose. To do so, participants would need to simulate others’ potential choices, particularly when option values vary across trials. As a result, we propose that the same brain regions involved in simulating others’ decisions would also be engaged during value-based decision-making in the presence of social observers.

      (13) Line 151: This sentence is too long and hard to follow:

      We have now revised the sentence as follows:

      [Page 5, Line 154]

      Furthermore, individuals’ prediction responses on subsequent 10 prediction trials where no feedback was provided (Fig. 2b) as well as self-reports about the perceived riskiness of the partners collected at the end of the Learning phase (Fig. 1d) consistently showed that they were able to distinguish one partner from the other, and correctly estimate the partners’ risk preferences (Predicted risk preference: t(42) = -11.46, P = 1.66e-14; Self-report: t(42) = -35.83, P = 4.10e-33).

      Furthermore, individuals’ prediction responses during the subsequent 10 trials without feedback consistently indicated that they could distinguish between the two partners and accurately estimate each partner’s risk preferences (t(42) = -11.46, P = 1.66e-14; Fig. 2b). Self-reported ratings of the partners’ perceived riskiness, collected after the Learning phase, further supported this finding (t(42) = -35.83, P = 4.10e-33; Fig. 1d).

      (14) Line 178: This sentence is very hard to follow. I am not sure what the authors were trying to say here. Please clarify.

      We have now revised the sentence as follows:

      [Page 5, Line 183]

      Various previous studies examined the impacts of social context on decision-making processes, but the suggested mechanisms by which individuals were affected by the social information depended on how the information was presented.

      à Previous studies have shown that social context can influence decision-making processes. However, the underlying mechanisms proposed have varied depending on how the social information was presented.

      (15) Line 183: "when individuals were given with the chances" should be "when individuals were given the chance".

      [Page 5, Line 187]

      On the contrary, when individuals were given with the chances~

      On the contrary, when individuals were given the chances~

      (16) Line 192: "are sensitive to the identity of the currently observing partner...". Do the authors mean are sensitive to the preferences of the currently observing partner? If so, please clarify, it is hard to follow.

      We have now revised the sentence as follows:

      [Page 5, Line 195]

      We hypothesized that if individuals are sensitive to the identity of the currently observing partner, they would take into account the learned preferences of others in computing their choices rather than simply in guiding the direction how to change their own preferences.

      à We hypothesized that if individuals are sensitive to the learned preferences of the observing partner, they would use this information to simulate the partner’s likely choices, rather than simply aligning their own preferences with those of the partner.

      Reviewer #2 (Recommendations for the authors):

      (1) The current neuroimaging findings appear to support the decision processes of all three models. I recommend that the authors provide more detailed evidence of model comparisons in the neuroimaging analysis. This should go beyond simply comparing the goodness of fit of neural activity.

      We acknowledge that neuroimaging data alone often do not provide conclusive evidence for specific information processing. In our study, we examined brain regions that track decision probabilities and are associated with social cognition, such as simulating others’ choice tendencies. Because these processes are general and not tied to a specific computational model, neural responses supporting the occurrence of such processes cannot be used to rule out alternative decision models. For this reason, our approach prioritized a rigorous behavioral model comparison as a critical first step before probing the neural substrates underlying the proposed mechanism. Our behavioral model comparisons, including both quantitative fit indices and qualitative pattern predictions, indicated that the proposed model best accounted for participants' decision patterns across task conditions.

      More importantly, to further validate the model, we conducted a model recovery analysis (see Fig. S2b in SI), which confirmed that our model can be reliably distinguished from alternative accounts even when behavioral differences are subtle. This result suggests that our model captures unique and meaningful characteristics of the decision process that are not equally well explained by competing models.

      With this behavioral foundation, our neuroimaging analyses were designed not to serve as independent model arbiters, but rather to examine whether brain activity in regions of interest reflected the computations specified by the best-fitting model. We believe this two-step approach—first establishing behavioral validity, then linking model-derived variables to neural data—offers a principled framework for identifying the cognitive and neural mechanisms of decision-making.

      Nevertheless, per the reviewer’s suggestion, we further examined whether there is neural encoding of both the participant’s own utility and the observer’s utility—serving as potential neural evidence to differentiate our model from the two alternative models. Please see below for our response to Reviewer 2’s Comment (2).

      (2) Specifically, if participants are combining their own and simulated choices at the level of choice probability, we would expect to see neural encoding of both their own utility and the observer's utility. These may be observed in different areas of the mPFC, as demonstrated by Nicolle et al. (Neuron, 2012). In that study, decisions simulating others' choices were associated with activity in the dorsal mPFC, while one's own decisions were encoded in the vmPFC. On the contrary, if the brain encodes decision values based on the shifted risk preference, rather than encoding each decision's value in separate brain areas, this would support the alternative model.

      We thank the reviewer for this constructive comment. In our Social reliance model, we assumed that the decision probability based on an individual’s own risk preferences, as well as that based on the observing partner’s risk preferences, both contribute to the individual’s final choice. As the reviewer suggested, neural evidence that differentiates our model from the two alternative models—the Risk preference change model and the Other-conferred utility model—would involve demonstrating neural encoding of both the participant’s own utility and the observer’s utility.

      The utility differences between chosen and unchosen options from the two perspectives—self and observer—were highly correlated, preventing us from including both as regressors in the same design matrix. Instead, we defined ROIs along the ventral-to-dorsal axis of the mPFC, and examined whether each ROI more strongly reflected one’s own utility or that of the observer. Based on the meta-analysis by Clithero and Rangel (2014), we defined the most ventral mPFC ROI (ROI1) as a 10 mm-radius sphere centered at coordinate [x=-3, y=41, z=-7], a region previously associated with subjective value. From this ventral seed, we defined four additional spherical ROIs (10 mm radius each) at 12 mm intervals along the ventral-to-dorsal axis, resulting in five ROIs in total: ROI2 [x=-3, y=41, z=5], ROI3 [x=-3, y=41, z=17], ROI4 [x=-3, y=41, z=29], ROI5 [x=-3, y=41, z=41].

      Consistent with Nicolle et al. (2012), the representation of one’s own utility (labelled as ‘Own subjective value’) and that of the observer (‘Observer’s subjective value’) was organized along the ventral-to-dorsal axis of the mPFC. Specifically, utility signals from the participant’s own perspective (SV<sub>chosen, self</sub> – SV<sub>unchosen, self</sub>) were most prominently represented in the ventral-most ROIs (blue), whereas utility signals from the observer’s perspective (SV<sub>chosen, observer</sub> – SV<sub>unchosen, observer</sub>) were most strongly represented in the dorsal-most ROIs (orange).

      (3) Additionally, the authors may be able to detect neural signals related to conflict when the decisions of the individual and the observer differ, compared to when the decisions are congruent. These neural signatures would only be present if social influences are integrated at the choice level, as suggested by the authors.

      If individuals simulate the choices that others might make, they may compare them with the choices they would have made themselves. To investigate this possibility, we categorized task trials as Conflict or No-conflict trials based on greedy choice predictions derived from a softmax decision rule. Conflict trials were those in which the choice predicted from the participant’s own risk preference differed from that predicted for the observer, whereas No-conflict trials involved the same predicted choice from both perspectives. A contrast between Conflict and No-conflict trials revealed that the dACC and dlPFC—regions previously associated with conflict monitoring and cognitive control (Shenhav et al., 2013)—were sensitive to differences in choice tendencies between the self and observer perspectives.

      Author response image 1.

      dACC and dlPFC are associated with the discrepancy between participants’ own choice tendencies and those of observing partners, as estimated based on prior beliefs about the partners’ risk preferences.

      As the reviewer suggested, these results provide evidence in support of the Social Reliance model, which posits that participants simulate the observer's choice and integrate it with their own.

      (4) Incorporating these additional analyses would provide stronger evidence for distinguishing between the models.

      We again thank the reviewer for these constructive suggestions. Based on the new set of analyses and results, we have made the necessary revisions as noted above. We agree that these revisions provide stronger evidence for distinguishing between the models.

      Reviewer #3 (Recommendations for the authors):

      (1) Anatomically it would be helpful to more explicitly distinguish between dmPFC and vmPFC. Particularly at the end of the introduction when mPFC and vmPFC are distinguished, as the vmPFC is in the mPFC.

      We appreciate the reviewer’s suggestion regarding the anatomical distinction between the dmPFC and vmPFC, particularly in relation to our use of the term “mPFC.” We acknowledge that the dmPFC and vmPFC are subregions of the broader mPFC. In our original manuscript, we referred to one region as mPFC in line with prior studies highlighting its role in social cognition and contextual processing (Behrens et al., 2008; Sul et al., 2015; Wittmann et al., 2016). However, in response to the reviewer’s comment and to more clearly distinguish this region from the ventral portion of the mPFC (i.e., vmPFC), which is canonically associated with subjective valuation, we have now revised the manuscript to refer to this region as the dmPFC. This terminology better reflects its association with social cognition, including model-estimated social reliance and sensitivity to social cues in our study.

      (2) The authors' definition of ROIs could be elaborated on further. They suggest that peaks are selected from neurosynth for different terms, but were there not multiple peaks identified within a functional or anatomical brain area? This section could be strengthened by confirming with anatomical ROIs where available, such as the atlases here http://www.rbmars.dds.nl/lab/CBPatlases.html and the Harvard-Oxford atlases.

      We appreciate the opportunity to clarify how our ROIs were defined. To identify the ROIs, we drew upon both prior literature and results from a term-based meta-analysis using Neurosynth. For each meta-map, we applied an FDR-corrected threshold of p < 0.01 and a cluster extent threshold of k ≥ 100 voxels to identify distinct functional clusters. For each cluster, we constructed a spherical ROI (radius = 10 mm) centered on its center of gravity. Note that for each anatomically distinct brain region, only a single center of gravity was identified and used to define the ROI. The resulting ROIs were subsequently used for small volume correction (SVC) in the second-level fMRI analyses.

      For brain regions associated with decision-making processes, we obtained a meta-analytic activation map associated with the term “decision” from Neurosynth. After applying an FDR-corrected threshold of p < 0.001 and a cluster extent threshold of k ≥ 100 voxels, we identified five distinct clusters: vmPFC [x = -3, y = 38, z = -10]; right vStr [x = 12, y = 11, z = -7]; left vStr [x = -12, y = 8, z = -7]; dACC [x = 3, y = 26, z = 44]; and left Insula [x = -30, y = 23, z = -1]. To identify brain regions involved in decision-making under social observation, we used the Neurosynth meta-map associated with the term “social”, applying the same criteria (FDR p < 0.001, k ≥ 100). This analysis revealed several clusters, including bilateral TPJ: right TPJ [x = 51, y = -52, z = 14]; left TPJ [x = -51, y = -58, z = 17]. To isolate brain regions more specifically associated with social processing rather than valuation, we also constructed a conjunction map using the meta-maps for the terms “social” and “value.” We identified clusters present in the “social” map, but not in the “value” map. This analysis yielded, among others, a cluster in the dmPFC [x = 0, y = 50, z = 14].

      To clarify our ROI analysis methods, we have now revised the manuscript to include more detailed information about the procedures used, as follows:

      [Page 19, Line 746]

      Region-of-interest (ROI) analyses. To define ROIs for the neural analyses conducted in the Observed phase, we used significant clusters identified during the Solo phase. Specifically, regions showing significant activation for Prob(chosen) in the DM0 (thresholded at P < 0.001) were selected as ROIs. Three ROI clusters were defined: the vStr (peak voxel at [x = 3, y = 14, z = -10], k<sub>E</sub> = 9), vmPFC (peak voxel at [x = –3, y = 62, z = –13], k<sub>E</sub> = 99), and dACC (peak voxel at [x = 12, y = 32, z = 29], k<sub>E</sub> = 118). These ROIs were then applied in the Observed phase analyses to test whether similar neural representations are also engaged in social contexts.

      Term-based meta-analytic maps from Neurosynth for small volume correction. To reduce the likelihood of false positives arising from random significant activations and to enhance sensitivity within regions of theoretical interest, small volume correction (SVC) was applied using term-based meta-analytic maps from Neurosynth. This approach allows for hypothesis-driven correction by restricting statistical testing to anatomically and functionally defined ROI. Specifically, three meta-analytic maps were generated using Neurosynth’s term-based analyses (Yarkoni et al., 2011), with a false discovery rate (FDR) corrected P < 0.01 and a cluster size > 100 voxels. For each resulting cluster, we defined a spherical ROI with a 10 mm radius centered on the cluster’s center of gravity. For each anatomically distinct brain region, only a single center of gravity was identified and used to define the corresponding ROI.

      First, to identify regions encoding final decision probabilities during the Solo phase and enhance sensitivity, we used the meta-map associated with the term “decision” to identify neural substrates of value-based decision-making. This yielded three clusters: vmPFC ([x = -3, y = 38, z = -10]), vStr ([x = 12, y = 11, z = -7]), and dACC ([x = 3, y = 26, z = 44]) (Fig. 3a, S7). Second, to examine social processing during the Observed phase, we used the meta-map associated with the term “social” to identify brain regions typically involved in social cognition. This analysis revealed clusters, including the rTPJ ([x = 51, y = -52, z = 14]) and lTPJ ([x = -51, y = -58, z = 17]) (Fig. 3c, S8a). Third, to define an ROI involved in processing social cues independent of valuation, we used a meta-map associated with “social” but excluding “value”, isolating regions specific to non-valuation-related social cognition. This analysis revealed a cluster, including the dmPFC ([x = 0, y = 50, z = 14]) (Fig. 3d, 4a, S8b).

      (3) How did the authors ensure there were enough trials to generate a reliable BOLD signal? The scanned part of the study seems relatively short.

      We appreciate the reviewer’s concern regarding the number of trials and the potential implications for the reliability of the resulting BOLD signals. While we did not conduct formal statistical tests to determine the optimal number of trials, our task design, in general, followed well-established principles in functional neuroimaging. Specifically, we employed a jittered event-related design and used both temporal and dispersion derivatives in the GLM analyses. These strategies are widely recognized for enhancing the efficiency of BOLD signal deconvolution and improving model fit by accounting for inter-subject and inter-regional variability in the hemodynamic response function (HRF). Furthermore, the number of trials per condition in our study was comparable to those reported in previous publications (20-30 trials) that employed similar gambling paradigms to examine individual differences in the neural substrates of value-based decision-making (Chung et al., 2015; Chung et al., 2020).

      (4) It would be helpful to add whether any brain areas survived whole-brain correction.

      No brain regions survived whole-brain correction. Nevertheless, as described in the introduction, we had strong a priori hypotheses. Based on these hypotheses, we defined term-based ROIs using Neurosynth, and conducted small volume correction analyses. Per the reviewer’s suggestion, we have added information indicating that no brain regions survived whole-brain correction, as follows:

      [Page 8, Line 281]

      No additional regions survived whole-brain correction.

      (5) There is a concern that mediation cannot be used to make causal inferences and much larger samples are needed to support claims of mediation. The authors should change the term mediation in order to not imply causality (they could talk about indirect effects instead) and highlight that the mediation analyses are exploratory as they would not be sufficiently powered (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843527/).

      We acknowledge the reviewer’s concerns regarding the causal interpretation of mediation analysis results. Per this comment, we have revised the manuscript as follows to avoid overinterpreting these results and to refrain from implying any causal inference.

      [Page 9, Line 327]

      Given that our sample size is smaller than the recommended threshold for detecting mediation effects (Fritz & MacKinnon, 2007), this significant indirect effect should be interpreted with caution, particularly with respect to causal inference.

      (6) The authors may want to speculate on lifespan differences in this susceptibility to risk preferences given recent evidence that older adults are relatively more susceptible to impulsive social influence (Zhu et al, 2024, comms psychology).

      We thank the reviewer for the thoughtful suggestion—we believe the referenced work is Zhilin Su et al. (2024). As noted in our manuscript, all participants in the current study were young adults aged between 18 and 29 years. Given this limited age range, our dataset does not provide sufficient variability to directly examine age-related differences across the lifespan. However, we are planning a follow-up study using the same task with older adult participants, which we believe will provide a valuable opportunity to address this important gap in understanding susceptibility to social influence across the lifespan.

    1. eLife Assessment

      This fundamental study demonstrates that lipid binding can regulate the dimerization state of the SARS-CoV2 Orf9b protein. The data from biophysical and cellular experiments along with mathematical modeling are compelling. This paper is broadly relevant to those studying coupled equilibria across all aspects of biology.

    2. Reviewer #1 (Public review):

      Summary:

      Felipe and colleagues try to answer an important question in Sarbecovirus Orf9b-mediated interferon signaling suppression, given that this small viral protein adopts two distinct conformations, a dimeric β-sheet-rich fold and a helix-rich monomeric fold when bound by Tom70 protein. Two Orf9b structures determined by X-ray crystallography and Cryo-EM suggest an equilibrium between the two Orf9b conformations, and it is important to understand how this equilibrium relates to its functions. To answer these questions, the authors developed a series of ordinary differential equations (ODE) describing the Orf9b conformation equilibrium between homodimers and monomers binding to Tom70. They used SPR and a fluorescent polarization (FP) peptide displacement assay to identify parameters for the equilibrium and create a theoretical model. They then used the model to characterize the effect of lipid-binding and the effects of Orf9b mutations in homodimer stability, lipid binding, and dimer-monomer equilibrium. They used their model to further analyze dimerization, lipid binding, and Orf9b-Tom70 interactions for truncated Orf9b, Orf9b fusion mutant S53E (blocking Tom70 binding), and Orf9b from a set of Sars-CoV-2 VOCs. They evaluated the ability of different Orf9b variants for binding Tom70 using Co-IP experiments and assessed their activity in suppressing IFN signaling in cells.

      Overall, this work is well designed, the results are of high quality and well-presented; the results support their conclusions.

      Strengths:

      (1) They developed a working biophysical model for analyzing Orf9b monomer-dimer equilibrium and Tom70 binding based on SPR and FP experiments; this is an important tool for future investigation.

      (2) They prepared lipid-free Orf9b homodimer and determined its crystal structure.

      (3) They designed and purified obligate Orf9b monomer, fused-dimer, etc., a very important Orf9b variant for further investigations.

      (4) They identified the lipid bound by Orf9b homodimer using mass spectra data.

      (5) They proposed a working model of Orf9b-Tom70 equilibrium.

      Weaknesses:

      (1) It is difficult to understand why the obligate Orf9b dimer has similar IFN inhibition activity as the WT protein and obligate Orf9b monomer truncations.

      (2) The role of Orf9b homodimer and the role of Orf9b-bound lipid in virus infection, remains unknown.

      Comments on revisions:

      In the revised manuscript, the authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This study focuses on Orf9b, a SARS-COV1/2 protein that regulates innate signaling through interaction with Tom70. San Felipe et al use a combination of biophysical methods to characterize the coupling between lipid-binding, dimerization, conformational change, and protein-protein-interaction equilibria for the Orf9b-Tom70 system. Their analysis provides a detailed explanation for previous observations of Orf9b function. In a cellular context, they find other factors may also be important for the biological functioning of Orf9b.

      Strengths:

      San Felipe et al elegantly combine structural biology, biophysics, kinetic modelling, and cellular assays, allowing detailed analysis of the Orf9b-Tom70 system. Such complex systems involving coupled equilibria are prevalent in various aspects of biology, and a quantitative description of them, while challenging, provides a detailed understanding and prediction of biological outcomes. Using SPR to guide initial estimates of the rate constants for solution measurements is an interesting approach.

      Weaknesses:

      This study would benefit from a more quantitative description of uncertainties in the numerous rate constants of the models, either through a detailed presentation of the sensitivity analysis or another approach such as MCMC. Quantitative uncertainty analysis, such as MCMC is not trivial for ODEs, particularly when they involve many parameters and are to be fitted to numerous data points, as is the case for this study. The authors use sensitivity analysis as an alternative, however, the results of the sensitivity analysis are not presented in detail, and I believe the authors should consider whether there is a way to present this analysis more quantitatively. For example, could the residuals for each +/-10% parameter change for the peptide model be presented as a supplementary figure, and similarly for the more complex models? Further details of the range of rate constants tested would be useful, particularly for the ka and kB parameters.

      The authors build a model that incorporates an α-helix-β-sheet conformational change, but the rate constant for the conversion to the α-helix conformation is required to be second order. Although the authors provide some rationale, I do not find this satisfactorily convincing given the large number of adjustable parameters in the model and the use of manual model fitting. The authors should discuss whether there is any precedence for second-order rate constants for conformational changes in the literature. On page 14, the authors state this rate constant "had to be non-linear in the monomer β-sheet concentration" - how many other models did the authors explore? For example, would αT↔α↔αα↔ββ (i.e., conformational change before dimer dissociation) or α↔βαT↔ββ (i.e., Tom70 binding driving dimer dissociation) be other plausible models for the conformational change that do not require assumptions of second-order rate constants for the conformational change?

      Overall, this study progresses the analysis of coupled equilibria and provides insights into Orf9b function.

      Comments on revisions:

      The authors have done a satisfactory job addressing my concerns.

      Regarding my recommendations to the authors - point 7: "Orf9b-FITC:Tom70" and "PT", representing the same species, are still both used in the equations on page 14, which is confusing for anyone who may wish to re-use the model. I appreciate this is quite a subtle point but given the importance of the model for the manuscript I feel the authors should do their due diligence to ensure it is presented as clearly as possible.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Felipe and colleagues try to answer an important question in Sarbecovirus Orf9b-mediated interferon signaling suppression, given that this small viral protein adopts two distinct conformations, a dimeric β-sheet-rich fold and a helix-rich monomeric fold when bound by Tom70 protein. Two Orf9b structures determined by X-ray crystallography and Cryo-EM suggest an equilibrium between the two Orf9b conformations, and it is important to understand how this equilibrium relates to its functions. To answer these questions, the authors developed a series of ordinary differential equations (ODE) describing the Orf9b conformation equilibrium between homodimers and monomers binding to Tom70. They used SPR and a fluorescent polarization (FP) peptide displacement assay to identify parameters for the equilibrium and create a theoretical model. They then used the model to characterize the effect of lipid-binding and the effects of Orf9b mutations in homodimer stability, lipid binding, and dimer-monomer equilibrium. They used their model to further analyze dimerization, lipid binding, and Orf9b-Tom70 interactions for truncated Orf9b, Orf9b fusion mutant S53E (blocking Tom70 binding), and Orf9b from a set of Sars-CoV-2 VOCs. They evaluated the ability of different Orf9b variants for binding Tom70 using Co-IP experiments and assessed their activity in suppressing IFN signaling in cells.

      Overall, this work is well designed, the results are of high quality and well-presented; the results support their conclusions.

      We thank reviewer #1 for their thoughtful assessment of our work and their constructive feedback.

      Strengths:

      (1) They developed a working biophysical model for analyzing Orf9b monomer-dimer equilibrium and Tom70 binding based on SPR and FP experiments; this is an important tool for future investigation.

      (2) They prepared lipid-free Orf9b homodimer and determined its crystal structure.

      (3) They designed and purified obligate Orf9b monomer, fused-dimer, etc., a very important Orf9b variant for further investigations.

      (4) They identified the lipid bound by Orf9b homodimer using mass spectra data.

      (5) They proposed a working model of Orf9b-Tom70 equilibrium.

      Weaknesses:

      (1) It is difficult to understand why the obligate Orf9b dimer has similar IFN inhibition activity as the WT protein and obligate Orf9b monomer truncations.

      We thank the reviewer for their observation and agree that the obligate homodimer IFN results were not what we expected to observe given our FP kinetic results with the purified obligate homodimer and noted our surprise in the discussion. We also note that we have two possible hypotheses for why this is the case.

      In our discussion, we noted the possible introduction of an increased avidity effect with fused homodimer and have improved it as follows with additions in red:

      “This result was unexpected as we had anticipated the obligate homodimer results to resemble the phosphomimetic. We hypothesize that this may be explained by two possible factors. First, we can’t exclude the introduction of an increased avidity between Orf9b and Tom70 when using the fused homodimer. Although our modeled decrease in the association rate of Orf9b:Tom70 (which increases the K<sub>D</sub> of the complex) suggests that fusing two copies of Orf9b decreases the affinity to Tom70, one copy of the fusion construct could also be capable of either binding to two copies of Tom70, or, one copy of the fusion could undergo rapid rebinding to Tom70. These effects would lead to a much tighter interaction in cellular assays than we modeled in vitro. A second possible explanation is that our assumptions about high lipid binding are not valid for cell based assays.”

      We also noted that a second possible explanation is due to our limitations in isolating the apo-fused homodimer to compare to the lipid-bound fused homodimer and possible differences this could have on our assays and briefly expanded upon this. Again, we improved this with additions in red:

      “As we have shown with both WT and fusion constructs, recombinantly expressed and purified Orf9b is lipid-bound and this can stabilize the homodimer to slow or inhibit the binding to Tom70. For the Orf9b fusion construct, we attempted to isolate the lipid-free species through protein refolding as previously described to compare the effect of lipid-binding on the homodimer fusion (similar to our WT experiments); however, we could not recover the stably folded homodimer. We hypothesize that the discrepancy between our kinetic results and Co-IP/IFN results could be due to subsaturation of the Orf9b fusion homodimers by lipids in cell based assays. While we have shown that lipid-binding occurs in recombinant expression systems, it is possible that in our cell based signaling assays that lipid-binding only affects a minor population of Orf9b. Given that we were unable to isolate the apo-fusion homodimer, we could not directly compare whether there are differences in fusion homodimer stability in the presence or absence of lipid-binding. Therefore, it is possible that the apo-fusion homodimer undergoes unfolding and refolding into alpha helices that lead to Tom70 binding similar to the WT construct.”

      (2) The role of Orf9b homodimer and the role of Orf9b-bound lipid in virus infection, remains unknown.

      We agree that we did not try to directly test for the role of the homodimer during infection and this remains an open area of exploration for future studies. We have included this caveat in our discussion but suggested possible experiments and future directions that could help shed light on this:

      “Although we have not directly tested for the role the homodimer conformation plays during infection, we have demonstrated that lipid-binding to the homodimer can bias the equilibrium away from Tom70. Lipids including palmitate have been shown to act as both a signaling molecule as well as a post-translational modification during antiviral innate immune signaling (S Mesquita et al. 2024; Wen et al. 2022; S. Yang et al. 2019). As a post-translational modification (referred to as S-acylation), MAVS, a mitochondrial type 1 IFN signaling protein that associates with Tom70 (X.-Y. Liu et al. 2010; McWhirter, Tenoever, and Maniatis 2005; Seth et al. 2005), has been shown to be post-translationally palmitoylated which affects its ability to localize to the mitochondrial outer membrane during viral infection and is a known target of Orf9b (Bu et al. 2024; Lee et al. 2024). When this is impaired (either by mutation or by depletion of the palmitoylation enzyme ZDHHC24), IFN activation is impaired (Bu et al. 2024). Therefore, future investigations should consider if the homodimer conformation of Orf9b is capable of antagonizing other IFN signaling factors such as MAVS by binding to palmitoyl groups. Indeed, Orf9b has already been shown to be capable of binding to MAVS by Co-IP (Han et al. 2021), however, whether or not this occurs through the palmitoyl modification remains unknown.”

      Reviewer #2 (Public review):

      Summary:

      This study focuses on Orf9b, a SARS-COV1/2 protein that regulates innate signaling through interaction with Tom70. San Felipe et al use a combination of biophysical methods to characterize the coupling between lipid-binding, dimerization, conformational change, and protein-protein-interaction equilibria for the Orf9b-Tom70 system. Their analysis provides a detailed explanation for previous observations of Orf9b function. In a cellular context, they find other factors may also be important for the biological functioning of Orf9b.

      Strengths:

      San Felipe et al elegantly combine structural biology, biophysics, kinetic modelling, and cellular assays, allowing detailed analysis of the Orf9b-Tom70 system. Such complex systems involving coupled equilibria are prevalent in various aspects of biology, and a quantitative description of them, while challenging, provides a detailed understanding and prediction of biological outcomes. Using SPR to guide initial estimates of the rate constants for solution measurements is an interesting approach.

      Weaknesses:

      This study would benefit from a more quantitative description of uncertainties in the numerous rate constants of the models, either through a detailed presentation of the sensitivity analysis or another approach such as MCMC. Quantitative uncertainty analysis, such as MCMC is not trivial for ODEs, particularly when they involve many parameters and are to be fitted to numerous data points, as is the case for this study. The authors use sensitivity analysis as an alternative, however, the results of the sensitivity analysis are not presented in detail, and I believe the authors should consider whether there is a way to present this analysis more quantitatively. For example, could the residuals for each +/-10% parameter change for the peptide model be presented as a supplementary figure, and similarly for the more complex models? Further details of the range of rate constants tested would be useful, particularly for the ka and kB parameters.

      We thank the reviewer for their constructive feedback and have generated supplemental figures providing a deeper analysis of the residuals for each model parameter adjusted +/- 10% from the reported values which we have added to our supplemental figures as Figure 1 - Supplemental 3 and Figure 4 - Supplemental 5  .

      We note that there are modest improvements in residual plots where model parameters are individually lowered by 10% from their reported value when considering this single dataset, however, our choice of using the reported values was driven by finding values that were suitable for improving model behavior across multiple concentration series in different datasets. Specifically, we have also included the RMSD values for each model parameter subjected to a +/-10% change from a single concentration time course as well as the percent change in RMSD relative to the RMSD generated by our reported model parameters to illustrate this. We have also included text that makes note of the observed pattern in the residuals from Figure 4 - Supplement 5 and provided some explanations for why this may occur.

      “Inspection of the residuals from the 5uM apo-Orf9b homodimer time course showed clear patterns when individual model parameters were subjected to a 10% increase or decrease from the reported values. While our proposed model qualitatively describes the concentration dependent change in kinetic behavior, the residual plots may suggest that additional binding reactions may also be occurring that are not captured by our model.”

      Figure 1 - Supplemental 3. Plots of residuals from Orf9b peptide model showing effect of an increase or decrease by 10% on each model parameter. All residuals and reporting are with respect to the100uM of unlabeled Orf9b peptide condition. Blue dots: reported value. Red dots: 10% increase in reported value. Green dots: 10% decrease in reported value. Table reporting of RMSD values for model fitsafter +/-10% change to model parameter (Left column) and percent change in RMSD relative to reported model RMSD (Right column).

      “As an alternative to attempting to place CIs on the parameters, we performed sensitivity analysis to determine which parameters the model was most sensitive to (see methods and Figure 1 - Supplemental 3). Additionally, we note that the model parameters were derived from the fit of only one concentration (100uM), but fit the other concentrations equally well. We observed that the model parameter that was most sensitive to change was the rate of Orf9b-FITC:Tom70 ([PT]) dissociation when subjected to a 10% increase or decrease whereas all other model parameters showed no sensitivity to change (Figure 1 - Supplemental 3).”

      Figure 4 - Supplemental 5: Plot of residuals showing the effect of increasing or decreasing individual model parameters 10% compared to the reported values. All residual plots are with respect to the 5uM apo-Orf9b homodimer condition. Blue dots: reported value. Red dot: 10% increase in reported value. Green dot: 10% decrease in reported value. (Left columns) Table of RMSD values calculated from model fits showing the effect of both +/-10% change to individual model parameters. (Right columns) Percent change in RMSD values subjected to +/-10% change for individual model parameters relative to the RMSD of the reported model.

      We have also included the following revised text to accompany this figure.

      “Further, we repeated the sensitivity analysis described previously for the peptide model and also considered the sensitivity of model parameters by inspecting each individually (Figure 4- figure supplemental 5). We found that when examining the residuals of the lowest concentration of 5uM, the model was most sensitive to changes in three parameters: the rate of homodimer association and dissociation and the conversion from β to α-monomers.”

      “Therefore, under low concentrations of Orf9b homodimer, binding to Tom70 is limited by the rate of homodimer association and dissociation as well as the conversion of Orf9b monomers to the α-helical conformation.”

      We have also included a supplemental figure showing how changes in the model parameters ka and kB affect the models behavior to help illustrate the range of values tested as Figure 4 - Supplemental 4.

      Figure 4 - Supplemental 4: Plots of model behavior showing the effect of changes to alpha-beta and beta-alpha monomer  interconversion rates compared to experimental values. Data is modeled with respect to the apo-Orf9b homodimer 5uM condition. Black line represents reported model fit and values used.

      We have also incorporated the following revised text.

      “The model parameters k<sub>a</sub> and k<sub>B</sub> describe the rate of interchange between the β-sheet and α-helix monomer conformations. These parameters must be estimated by modeling because our assays do not allow us to directly measure the folding rates between these conformations. To identify these values, we performed a scan of k<sub>a</sub> and k<sub>B</sub> values that yielded the best agreement between the model and the experimental conditions (Figure 4 - figure supplemental 4).”

      The authors build a model that incorporates an α-helix-β-sheet conformational change, but the rate constant for the conversion to the α-helix conformation is required to be second order. Although the authors provide some rationale, I do not find this satisfactorily convincing given the large number of adjustable parameters in the model and the use of manual model fitting. The authors should discuss whether there is any precedence for second-order rate constants for conformational changes in the literature. On page 14, the authors state this rate constant "had to be non-linear in the monomer β-sheet concentration" - how many other models did the authors explore? For example, would αT↔α↔αα↔ββ (i.e., conformational change before dimer dissociation) or α↔βαT↔ββ (i.e., Tom70 binding driving dimer dissociation) be other plausible models for the conformational change that do not require assumptions of second-order rate constants for the conformational change?

      We thank the reviewer for their feedback. During our studies, we tested several models prior to the final one presented in Figure 4A. The first model that we tested as described in Figure 4 - Supplemental 3 described ββ↔α↔αT with no conformational change. We tested several models that integrated the existing structural data for both Orf9b and Tom70 and found that while these models could fit individual time series, they did not explain the concentration dependent changes in subsequent time series nor did they explain changes induced by lipid-binding and mutations in VOC.

      With respect to the possibilities of αT↔α↔αα↔ββ and α↔βαT↔ββ models, we have revised our manuscript to mention that we did test additional models before we settled on the model that we presented.

      “We tested different reaction schemes that incorporated the interconversion between β-sheet to α-helix conformations by considering models that described a conformational change in the homodimer leading to Tom70 binding rather than monomers. None of these models adequately described our experimental results, therefore we continued developing our model as outlined in Figure 4D”

      With respect to the second-order rate describing the fold change from β to α, we have added the revised text to the manuscript:

      “We initially tested the impact of keeping the rate constant k<sub>a</sub> first order, just like k<sub>B</sub> which did yield the sigmoidal behavior we observed in the 5uM apo-homodimer condition. However, this assumption failed to describe the data at other concentrations resulting in a substantial overestimation compared to our experimental results when holding k<sub>B</sub> at a constant value throughout. We found that when the β-sheet to α-helix rate (k<sub>a</sub> ) was made a second order rate constant, we were able to hold the rate constant across all concentrations tested suggesting a non-linearity in the monomer β-sheet concentration.”

      While this was surprising to us, we reasoned that a biological explanation for why the conversion from β to α was second order was that the β-monomers may transiently self-associate to cooperatively fold into the α-helical conformation. We did acknowledge this choice to make the β to α parameter non-linear (unlike the α to β conversion which was single order).

      We concede that we could not find specific examples describing non-linear kinetics comparable to the system we described in literature, however, such systems have been reported for proteins that exhibit high structural plasticity where transient interactions with another copy of the protein or another protein altogether drive folding changes and we have revised this manuscript to include some additional citations to papers that describe such systems (Zuber et al. 2022; Tuinstra et al. 2008).

      Overall, this study progresses the analysis of coupled equilibria and provides insights into Orf9b function.

      Reviewer #1 (Recommendations for the authors):

      (1) What was the unlabeled Orf9b peptide is added to the pre-equilibrated Orf9b-FITC:Tom70 solution as a competitor? Figure 1D illustrates that the competitor was full-length Orf9b.

      We have revised the figure to illustrate that in this experiment, the competitor is the unlabeled FITC peptide and not the full length Orf9b sequence

      (2) Figure 2B, what is the higher Mw peak from refolded Orf9b homodimer.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2B.

      “The SEC elution profile and retention volume of refolded Orf9b directly overlapped with natively folded homodimeric Orf9b and suggested a high recovery of the refolded homodimer with the early eluting peaks corresponding to either a chaperone-bound species (natively folded) or misfolded protein (refolded) as judged by SDS-PAGE (Figure 2B). Together, the overlap in elution peaks corresponding to the folded homodimer suggested a high recovery of the homodimer from the refolding conditions.”

      (3) Figure 2C, in the main text, the authors state that "...observed that the refolded homodimer structure closely aligned with the lipid-bound reference structure, which shows that the homodimer fold can be recovered after denaturing". Please provide structural comparison details here, software used? Rmsd and Dali Z-score.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2C.

      “Aligning the structure of the Orf9b homodimer (PDB 6Z4U) with our structure of the refolded Orf9b homodimer (9N55) in Pymol resulted in an RMSD of 1.1Å. Further, we also searched our structures of the refolded Orf9b homodimer on the Dali server against the existing structures of the lipid-bound Orf9b homodimer which yielded a Z-score of 2.2 which shows good correspondence between the structures.”

      (4) To prove the refolded Orf9b homodimer did not contain lipid, could the authors provide mass spectra data for the refolded Orf9b sample and compare it with the results in Figure 2 - Supplemental 1.

      We do not have complete mass spectra data for the refolded homodimer samples, however, we feel that the native mass spectrometry data provides a good orthogonal comparison between natively folded and refolded samples for the presence or absence of lipids. We concede that we only used mass spectrometry to characterize the four peaks that were unique to the natively folded deconvoluted spectra which confirmed that shift in mass relative to the expected homodimer molecular weight corresponded to the two lipids we presented. However, we would expect that performing mass spectrometry on the refolded sample would only further confirm our observations from the crystal structures and the native mass spectrometry.

      (5) Have the authors tried to use analytical ultracentrifugation to analyze the Orf9b dimer-monomer equilibrium, given that AUC provides a much more accurate measurement of molecular mass?

      We thank the reviewer for this suggestion and agree that AUC could be an additional useful strategy for monitoring the dimer-monomer equilibrium and provide additional validation of the molecule weights of both the monomer and homodimer.

      While we have not performed AUC, we have revised our manuscript to include more discussion about the determination of molecular weights by SEC.

      “For the Orf9b homodimer, the retention volume was consistent with molecular weight standards based on the expected molecular weight of the homodimer (~21kDa) and the standard (~29kDa). In the case of the Orf9b monomer, although we would expect the retention volume of the monomer (~10.6kDA) to be between the molecular weight standards of 13.4kDa and 6.5kDa, the greater retention volume could be explained by non-specific hydrophobic interactions between the monomeric Orf9b and the column.”

      (6) The authors used truncation of 7 C-terminal amino acids to generate an obligate Orf9b monomer for their assays. It would be interesting to mutate residues at the homodimer interface to generate Orf9b monomers rather than deleting residues. For example, mutate 91-96aa (FVVVTV) to negatively charged residues, which will not only disrupt the dimerization interface, but also impair lipid binding. The dimer interface mutant should then be tested in their SPR, FP assays, as well as IFN inhibition assays.

      We thank the reviewer for their suggestion and agree that mutation of the 7 C-terminal amino acids into negatively charged residues could be an interesting alternative strategy to generating an obligate Orf9b monomer without the need for truncating the residues. Our choice of using the truncated construct we proposed was driven by our analysis of the structure of the homodimer which reveals that a significant portion of the dimer interface is composed of backbone-backbone hydrogen bonding between the two chains of Orf9b. We reasoned that truncating these residues would be the most effective way to compromise the interface between the two chains and drive a predominantly monomeric behavior, however, compromising the interface with multiple mutations is an intriguing alternative.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could comment on the slow monomer-dimer exchange observed by SEC and how it fits with their other analysis.

      We thank the reviewer for their comment and concede that the slow exchange may be a limitation of this experimental setup. Our observations from our SPR experiments and modeling showed us that the homodimer may be fast to dissociate into monomer given the off rate which would suggest a half-life for the homodimer to be on the order of seconds, however, we still observe a noticeable dimer species on the chromatograms. We initially allowed the diluted samples to reach equilibrium prior to injection onto the analytical sizing column, however, it is possible that the system is still in a pre-equilibrium prior to injection onto the column. This could be driven by interactions between the protein and the column that prevents full dissociation of the homodimer. While this is a limitation, we note that we did not use the Kd value that we determined by non-linear regression fitting to the equilibrium observed on the chromatograms for downstream experiments but instead used the value to get a ballpark estimate for the homodimer Kd which is on the same order as the Kd determined by SPR.

      (2) It might be useful to include the rate constants on the reaction arrows of the schematic representation of the models.

      We have revised Figure 4D to include the rates for both Orf9b monomer binding to Tom70 and Orf9b binding to Orf9b as derived from the SPR experiments as well as the modeled values for the interconversion between α and β monomers. We also revised Figure 7 to include these values as well as the modeled dissociation rate for homodimer when lipid-bound.

      (3) I couldn't find how the sensitivity analysis was performed for the more complex models. Was this the same +/- 10% as per the peptide model?

      We used the same +/- 10% sensitivity analysis for the peptide model in the more complex equilibrium model and have revised our manuscript to clearly reflect that.

      (4) Further clarification of "inspection of residuals suggested that the fits were accurate". In Figure 1B, the residues look to have systematic errors, perhaps indicating other processes occurring.

      We agree that in the SPR kinetic fitting results for the Orf9b peptide binding to Tom70 in Figure 1B that there are some regions where the fit over or under estimates the experimental results. This is partially the result of limitations in the number of different binding models that we can fit in the analysis software which is why we reported using a 1:1 langmuir binding model. It is certainly possible that there may be some additional binding reactions that occur, however, we limited our use of these specific kinetic results to the peptide model that we proposed in Figure 1D. We did note in the manuscript text that it was necessary for us to change the model parameter values to some extent in order to fit our experimental results which may be partially explained by the SPR fitting errors.

      “With the parameter set obtained from the 100µM condition, we then held all parameters fixed and simply changed the peptide concentrations in the model to fit the remaining conditions by hand. We note that this process saw the model parameter values change between 3% at the lowest end up to 70% at the highest end from the experimentally derived values but remained within an order of magnitude of the experimental SPR values. We speculate that this arises due to the differences in experimental setup between SPR and FP-based methods of measuring kinetics.”

      (5) The manuscript builds logically, but given the sophisticated nature of the system and the modelling could benefit from more clarity/streamlining in the descriptions/illustrations.

      We have revised our manuscript in response to both reviewers comments and hope that the clarity of the work is improved as a result.

      (6) Figure 4 Supplement 3 - where did the rate constants for Model 1 come from? Was there any attempt to alter them to fit the data better?

      We have clarified in the figure description that the rate constants used in Model 1 were the same values used in Figure 4B (but without the interconversion between beta and alpha rates).

      “Comparison of kinetic model 1 and 2 in describing experimental results from the kinetic binding assay. Experimental results using 10uM of refolded Orf9b homodimer are shown as rings with the predicted behavior of model 1 (equilibrium exchange) shown as a dark blue line. The predicted behavior of model 2 (equilibrium exchange with a conformational change between β-sheet and ɑ-helical monomers) is shown as the light blue line. Model parameter values were the same as described in Figure 4D and kept constant in both model comparisons.”

      (7) What are and [PT] in the second set of equations (page 13)?

      [‘PT] refers to the concentration of “fluorescent probe” (Orf9b-FITC) and Tom70.

      (8) "Additionally, the fused homodimer association rate (which can be viewed as a rate of tertiary complex formation)" - can the authors provide a mathematical proof for this?

      In the case of the fused homodimer kinetic data, we did not develop a separate model to explicitly take into account the differences between using a fused construct versus the WT construct that can dissociate into monomers. We have clarified our interpretation of this in the manuscript.

      “Although our model explicitly describes homodimer dissociation into monomers as a requisite step for Orf9b binding to Tom70, we adapted it for the fusion experimental data. In this case, all model parameters other than the association and dissociation kinetics of the fluorescent probe and Tom70 were adjusted to achieve the best agreement with the experimental data. When applied to the fusion homodimer, the parameters describing homodimer dissociation into separate monomers could instead describe the dissociation of the two β-sheet domains away from each other in the tertiary structure but remaining physically linked through the linker region.”

      (9) "For Lambda and Omicron, the P10S mutation results in the serine being positioned to form several hydrogen bonds between R13 and the backbone carbonyl of A11 and L48 within the same chain..." is this taken from AlphaFold predicted structures of the mutants? If so, it should be made clear that this is derived from predicted structures. And even so, AlphaFold can be poor at determining structures of mutants, and so there is greater uncertainty in the prediction of the bonds.

      For Lambda, Omicron, and Delta mutations, we used Pymol to examine how the placement of mutations could structurally explain the kinetic differences we observed in our model. We have gone back and clarified in the figure description that these predictions are not derived from AlphaFold.

      (10) "biological replicates" - is this different protein purifications?

      Yes, in this case biological replicates refer to different protein purifications for all variants described and tested.

      (11) Are any of the authors involved in the Berkeley Madonna commercial software used in the manuscript? If so, should this be in the conflict of interest statement?

      Yes, Michael Grabe is an owner of Berkeley Madonna, and we have updated our conflicts of interest statement to reflect this.

    1. eLife Assessment

      This important work describes a set of parameters that give a robust description of shape features of cells in tissues. The evidence for the usefulness of these parameters is solid. The work should be of interest for anybody analyzing epithelial dynamics, but more details about the analysis of experimental images are necessary and some streamlining of the text would increase the accessibility of the material for non-specialists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors stated aim is to introduce so-called Minkowski tensors to characterize and quantify the shape of cells in tissues. The authors introduce Minkowski tensors and then define the p-atic order q_p as a cell shape measure, where p is an integer. They also introduce a previously defined measure of p-atic order in the form of the parameter \gamma_p. The authors compute q_p for data obtained by simulating an active vertex model and a multiphase field model, where they focus on p=2 and p=6 - so-called nematic and hexatic order - as the two values of highest biological relevance. Based on their analysis, the authors state that q_2 and q_6 are independent, that there is no crossover for the coarse-grained quantities, that a comparison of q_p for different values of p is not meaningful, and determine the dependence of the mean value of q_2 and q_6 on cell activity and deformability. Subsequently, they apply their method to data from MDCK monolayers and argue that the full range of q_p values needs to be considered to characterize shape and positional order in epithelia..

      Strength:

      The work presents a set of parameters that are useful for analyzing cell shape.

      Weaknesses:

      The introduction of the Minkowski tensors is hardly accessible for typical biologists. Eventually, most quantification is done using q_p, which can be defined without recursion to Minkowski functionals. The relation to Minkowski functionals makes the important properties of robustness and stability evident. However, for an audience of biologists, the derivation of this property could be relegated to an Appendix. Instead, the text could directly go to the results of the analysis of experimental and modeling data.

      Important details about how the cell shapes are extracted from the experimental data are missing. The two data sets the authors consider are not analyzed in the same way.

    3. Reviewer #3 (Public review):

      Hapel et al. present an article entitled Quantifying the shape of cells - from Minkowski tensors to p-atic order. The paper reports the p-atic quantitative method - established in physics - to extract cell full shapes in biological experiments using their images of epithelial MDCK cells (phase contrast) and also images reported in another paper as well as their own simulations based on active vertex model and multiphase phase fields approaches. Authors present the rationale of this new strategy for quantification. They adapt the method of Minkowski tensors and they extract distributions of cell shapes readouts with plots of their distributions. An emphasis is given to changes in cell shapes captured by this method. Higher rank tensors are considered as well as representations with intuitive meanings and q_i orders and their potential correlations or absence of correlations - for example q_2 and q_6, leading to statements about nematic and hexatic orders.

      This analysis and its strength are contrasted with Armengol-Collade et al. (2023) quoted in the paper, who consider polygonal shapes for cells and their shape function 𝛾_p. Authors support the notion of a key improvement thanks to Minkowski tensors approach and doing so, they challenge the former crossovers correlations statements reported in Armengol-Collade et al. (2023). In this context, they defend that nematic liquid crystals approach is not sufficient to capture cell dynamics in tissues. Also they propose that q_2 and q_6 could serve as readout for activity and deformability of cells among other statements related to their approach.

      A variety of analytical methods have been realised to track cells in monolayers in vitro and in vivo during morphogenesis - for example, shear decomposition (from MPI-PKS Dresden) or links joining centroids and their neighbours approach (MSC/Curie Paris) to name few examples. It will be interesting in the future that systematic comparisons between these analytical methods are performed with highlights on their respective advantages and drawbacks. This will allow experimentalists to identify the best relevant methods to address their morphogenetic questions.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      I would suggest that the authors focus on what I think is the main goal of the work, namely, to consider the whole cell contour when characterizing cell shape instead of only some points on the contour. A reference to the connection with Minkowski tensors and the biologically relevant mathematical consequences of this connection would suffice; a detailed definition of the Minkowski tensors does not seem to be necessary. Especially because you do not really use them. You could use the analysis of the simulation data to explain what the γ<sub>p</sub> miss and for which statements they would be sufficient.

      We argue that the explanation of Minkowski tensors is helpful and should remain in the Methods and materials section. There are two reasons: First, our argumentation relays on the robustness and stability properties of Minkowski tensors. Introducing q<sub>p</sub> without the connection to Minkowski tensors would not allow us to make these statements. Second, Minkowski tensors seem not well known in the community, otherwise measures like γ<sub>p</sub> would not have been introduced. Furthermore, readers not interested in the technical details could skip this part of the manuscript and directly go to the Results section. Concerning the questions, what the γ<sub>p</sub> miss and for which statements they would be sufficient, the answer from a purly mathematical point of view is rather simple: As γ<sub>p</sub> does not share robustness and stability it should not be used in any case! The provided results on computational and experimental data demonstrate the consequences of using such measures. In case of the proposed nematic-hexatic transition in Armengol-Collade et al. (2023) the consequence is severe, as this transition is specific only to the used method but not to the underlying physics. A second aspect which we now further highlight is the influence of approximating a cell by a polygon. We demonstrate that this approximation is responsible for a strong hexatic order on the cellular scale in the considered MDCK data from Armengol-Collade et al. (2023).

      It is not clear to me what we should learn about the two tissue models by using q<sub>2</sub> and q<sub>6</sub> to quantify cell shape. Can you clearly formulate one or more conclusions?

      What we can learn from the research is a dependence of q<sub>p</sub> on model parameters in the two tissue models is

      increases with higher activity or deformability

      decreases with higher activity or deformability.

      Furthermore, q<sub>2</sub> and q<sub>6</sub> are independent and describe distinct properties. Using these models as a basis to coarse-grain and derive continuous models on the tissue scale, these results indicate that more general p-atic liquid crystal theories should be used and the simplest nematic liquid crystal theories might not be sufficient.

      The experimental data and their analysis does not seem to add anything to the work. Do you report only data from independent measurements, or did you consider all images of a monolayer?

      As we now also analyze experimental data from Armengol-Collado et al. (2023) which confirm our findings on independency of q<sub>2</sub> and q<sub>6</sub> and also confirm that the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape, additional experimental data are indeed no longer needed. We, therefore, skip the detailed analysis of this data and only keep the results in Fig 1 and Fig 2 and the corresponding figures in the appendix as illustrating examples.

      L13: ”P-atic liquid crystal theories offer new perspectives on how cells self-organize (...)” This is a difficult entry, because the average reader of eLife might not be familiar with p-atic liquid crystals.

      We agree that p-atic liquid crystals might not be familiar to the average reader. For this reason we introduce orientational order in the introduction with examples demonstrating that not only nematic, but also tetratic and hexatic order have been identified in tissue and introduce the different symmetries. Furthermore, we provide examples for p-atic liquid crystals from other fields and various references. In the conclusion, we also cite models for p-atic liquid crystal theories. Even if the average reader is not familiar with these theories, it should become evident that nematic order might not be sufficient to describe tissue as other symmetries are present as well.

      L32: ”nematic” needs to be introduced.

      Nematic order is already explained as rotational order with 180° degrees. The references cited discuss nematic liquid crystals in the context of morphological changes in tissue. We therefore only added a standard text book as reference for liquid crystal theories and refrain introducing it in more detail in the manuscript.

      Figure 1: Why do you show the data for q<sub>3</sub>, q<sub>4</sub>, and q<sub>5</sub>, which you do not really consider in this manuscript? Same for Figure 2. Why not combine the two figures? Furthermore, you show q<sub>p</sub> without having defined them yet.

      We consider all p \= 2,3,4,5,6, but focus on p = 2,6 in the main text and p = 3,4,5 in the appendix. Figures 1 and 2 essentially only introduce the subject and help to relate p-atic order to cell shapes and introduce the methodology to analyze the data. Our conclusion is that all p can be important and should be considered in continuous descriptions of tissue.

      Equation 1: The notation is confusing: the domain of integration (C or ∂C) also appears as the variable you integrate.

      The equation is correct. The variable of integration is 1 or H and the domain of integration is C (cell) or ∂C (cell contour).

      L68: ”a snapshot of the considered monolayer of wild-type MDCK cells”. Did you analyse only one monolayer? Please, provide information about the number of monolayers that were imaged and how many cell shapes were analyzed.

      We have analyzed one monolayer and have added the missing information.

      L86: ”field-specific prefactors” I do not understand what is meant by these.

      Different communities, e.g. physics, mathematics, cosmology, .... use different prefactors in the definition. We have removed this statement.

      L89: ”Hadwiger’s characterization theorem”. What is this?

      This mathematical result is important to claim robustness and stability, it can be found in the cited reference.

      L104: ”the essential property is the continuity”. Essential for what?

      Essential ”for our purpose” to characterize the shape of cells by a robust method.

      L120: ”the theory also guarantees robust description of p-atic orientation for p = 3,4,5,6,...” I do not understand what you mean.

      The previous examples only consider p \= 2. However, the cited theoretical results also hold for p = 3,4,5,6,..

      Equations (5) and (6): You define ψ<sub>p</sub>(C) twice. Are the definitions equivalent? Why do you need both?

      This is not a different definition, equation (6) is a reformulation which is more useful for our purpose. But we indeed define ϑ<sub>p</sub> twice. We now use a new symbol to distinguish ϑ<sub>p</sub> in Equation 7 and 9.

      Figure 4: ”The visualization uses rotationally-symmetric direction fields (known as p-RoSy fields in computer graphics (Vaxman et al., 2016)).” I guess that you have used these fields already in Figure 1, so why introduce them only now?

      We have moved this comment to Figure 1.

      Figure 6: Using a few discrete values cannot illustrate continuity. Also, the ”jump” in γ<sub>p</sub> results from deleting a vertex, so I doubt that this is a fair comparison. Still, I think that it is important to point out to the reader that the value γ<sub>p</sub> depends on the number of vertices (here, I allow that two edges connected by a vertex are aligned).

      We adjusted the caption to make our point more clear. The last image is a triangle and according to the definition of γ<sub>p</sub> is, therefore, described by only three vertices. So, it is indeed a fair comparison. The reviewer is right that the value of γ<sub>p</sub> has a strong dependency of the number of used vertices, this is exactly the point that we are trying to make with this figure. Also, adding vertices artificially to make γ<sub>p</sub> continuous leads to more problems, as the values for γ<sub>p</sub> change if we change the number of vertices. But an equilateral triangle should be recognized as an equilateral triangle, no matter if there is an artificial fourth vertex or not. The triangle in our picture and the triangle that the reviewer mentioned (so our triangle with an artificial fourth vertex) both have the shape of an equilateral triangle, yet for one it is |γ<sub>3</sub>| = 1.0 and for the other one it is |γ<sub>3</sub>| = 0.935.

      While we agree on the reviewers statement about continuity, we did not modify the sentence, as the meaning should be clear.

      L160: The definition of the center of mass is incorrect as it is not that of an extended object whose contour is defined by a polygon, but only of the set of vertices. In Figure 6 you write ”the choice of the center of mass highly influences the value of γ<sub>p</sub>” - is there really a choice of the center of mass? I thought that it was uniquely defined.

      We here only repeat the definition from Armengol-Collado et al. (2023) in order to be able to directly compare our analyses with the results presented therein. We adjusted the caption to be more clear.

      L166: What is the weighting you refer to in Equation 9?

      We apologize, the reference is to Equation 8. We have modified this.

      L312: ”Quantifying orientational order in biological tissues can be realized by Minkowsky tensors”. As mentioned above, you do not really use them, but use Equation (7), which can be defined without reference to Minkowski tensors.

      Eq. (7) is part of the irreducible representations of the Minkowsky tensor. Therefore the sentence is correct.

      L318: I do not quite understand the link between being able (or not) to compare q<sub>p</sub>’s for different values of p and the interpretability of q<sub>2</sub> and q<sub>6</sub>. Also, since you introduce q<sub>p</sub>, how can the question about their comparability be a recurrent challenge? Finally, would you agree that even though a comparison between the absolute values of q<sub>2</sub> and q<sub>6</sub> is inappropriate, one can still meaningfully compare relative changes as a parameter is changed or when comparing cells in different conditions?

      We have modified the sentence. Furthermore we agree that one can still meaningfully compare relative changes as a parameter is changed, as we do. However, our claim that q<sub>2</sub> and q<sub>6</sub> are independent, does not allow to conclude any kind of nematic-hexatic phase transition. We have now provided further evidence using the published data of Armengol-Collado et al. (2023), which unequivocally supports this statement. We would also like to remark that the detection of a phase-transition requires a single order parameter, which cannot exist as q<sub>2</sub> and q<sub>6</sub> are independent.

      We have further explained this in the main text.

      Figure 7: The axes are not labeled.

      We added the labels.

      L359: ”q<sub>2</sub> and q<sub>6</sub> values cluster tightly”, L362 ”q<sub>2</sub> and q<sub>6</sub> values become highly scattered” Please, quantify.

      We kept these formulations but have added statistical measures to these qualitative descriptions, see Supplementary Figures to Fig 7 for the distance correlation and the P-values of the distance correlation. These data support our claim of independence.

      L362: ”each q<sub>2</sub> value spans a broad range of q<sub>6</sub> values and vice versa, demonstrating their independence”. Please, use a quantitative test of statistical independence.

      We have added statistical information by using the distance correlation and statistical tests, see Supplementary Figures to Fig 7. Similar results are obtained for the Pearson correlation and corresponding tests. However, they are not included as the distance correlation is more general.

      L371: Please, define Q<sub>2</sub> and Q<sub>6</sub> in the main text.

      We have now added the definition to the Materials and methods section.

      L420: A reference seems to be missing.

      Thanks for pointing this out. This was a formatting error, we only wanted to cite Balasubramaniam et al. (2021).

      L425: ”strong dependence of cell shape on cell density”. But q<sub>6</sub> seems to be rather independent of density, see Figure 11. Also, what do you mean by ”strong”? Can you quantify?

      The dependency of the cell shape on the cell density is shown in detail in (Eckert et al., 2023). Furthermore, to describe the cell shape the values for all p are needed. So the change in q<sub>2</sub> already indicates a change in the overall cell shape even as q<sub>6</sub> is barely changing. As we excluded these experimental results now in favor of the experimental data also used in Armengol-Collado et al. (2023), we did not add further evaluations regarding cell density.

      L453 ”These divergences [nonmonotonic dependence of γ<sub>p</sub> on activity or deformability] highlight the limitations of γ<sub>p</sub> in capturing consistent patterns”. I am not sure to follow your argument here.

      Besides the quantitative differences seen in comparing Fig. 1 and Fig 2 with the corresponding figures in the appendix, these results show qualitative differences. Using a method which is not robust and not continuous leads to qualitative different results. The nonmonotonic dependence of γ<sub>p</sub> is specific to the method but not to the underlying physics.

      Appendix 3 - Figure 20: It is not clear how to compare this figure to Figure 3e of Armengol-Collado et al 2023. Please, provide more details.

      Appendix 3 - Figure 20 (Appendix 3 - Figure 25 in the revised version) and Figure 3e in Armengol-Collado et al. (2023) cannot be directly compared. Fig 3e shows results of experiments and multiphase field simulations for one parameter stetting and Fig 20 results of the active vertex model for various parameter settings. But both are considered using γ<sub>p</sub> and Γ<sub>p</sub>. We have added these computation, see Fig. 13, which indeed reproduces the results from Fig 3e. We refrain from considering corresponding plots to Fig 20 for the multiphase field model, as this first requires computing the vertices and no additional information can be expected.

      Reviewer 2:

      The manuscript lacks statistical information. The following should be addressed: How often have the experiments been performed? How many monolayers have been analyzed? How many time steps have been considered and in what duration? How many cells have been included in the analysis? What are the p-values to determine if q<sub>p</sub>’s (Figure 2, panel a) and γ<sub>p</sub>’s (Appendix 3-Figure 17, panel a) are significantly different? Same figures: How many cells and experiments have been considered here? Figure 11: What is the density of cells for each condition? Please provide the corresponding values. How significant are the differences? How many times has the experiment been repeated? Figure 12: Due to cell proliferation, the cell density changes over time. Does this need to be taken into account?

      We agree, our information have only been qualitative. We have added the missing information. Especially we added statistical information by using the distance correlation and statistical tests, see Supplementary Figures to Fig. 7. Similar results are obtained for the Pearson correlation and corresponding tests (not included). As we excluded the experimental results previously shown in Figure 11 and Figure 12, in the revised version in favor of the experimental data that is already published in Armengol-Collado et al. (2023), we did not add further statistics regarding this. We added the number of frames and cells in the text.

      The image analysis part of the Method section states that time-series were xy-drift corrected, and cells were tracked. However, the manuscript does not contain results of dynamical data, timedependent analyses, or discussions of how q<sub>p</sub> changes over time. The authors mention that the fluidity of the tissue was confirmed by the MSD, neighbor number variance, and the self-intermediate scattering function, but none of the results are shown in the manuscript. I would like to ask the authors to provide the results and related content in the Method section.

      We have modified the description and removed all parts related to dynamical data. Due to the heavy overload of images in the manuscript we refrain from providing all the results for the phase diagram to distinguish solid and fluid phase. These measures have been provided previously for the considered modeling approaches and provide here only a side remark. Our results do not depend on an exact localization of a solid-fluid phase boundary.

      Additional information is missing in the Image analysis part of the Method section. Could the authors provide the information on the image analysis steps between obtaining the segmented image and inputting the parameters for the Minkowski tensor? This should include how the normal vectors have been determined and whether this has been done for all pixels along the contour.

      We added further details in the section Extraction of the contour in Experimental setup in Methods and Materials and also provide the code to compute q<sub>p</sub> for segmented images.

      The authors have analyzed low-resolution phase contrast images acquired with a 10x objective to experimentally support their introduced Minkowski tensors. This may have decreased the resolution of the cell boundary detection and its curvature. I strongly suggest imaging the tissue with higher magnification (40x or 63x) and/or fluorescent markers to visualize the cell boundaries in high quality. This would allow the authors to distinguish between circles and circle-like shapes (lines 432-434) and to further investigate differences between MDCK wild-type and MDCK E-cad KO cells.

      We agree that higher resolution of the images would be beneficial. However, we are convinced that this will not influence our findings. Instead of performing the experiments with higher magnification or using fluorescent markers, we have considered the experimental data from Armengol-Collado et al. (2023) to support our results.

      The authors have coarse-grained the shape function, Γ<sub>p</sub>, and have chosen the active vertex model (Appendix 3-Figure 20) for comparison with the Minkowski tensors, Q<sub>p</sub> (Appendix 2 Figure 13). In both figures, the hexatic-nematic crossover does not occur. Armengol-Collado et al. have previously reported that the Voronoi model failed to achieve the hexatic-nematic crossover and argued that this is due to the artificial enhancement of the polygon’s hexagonality, leading to high hexatic order at the tissue scale. Since the authors have used the Voronoi-tailing method (line 196), I would like to ask the authors to compare the multiphase field models for Γ<sub>p</sub> andQ<sub>p</sub> instead.

      We would like to mention that we do not consider a Voronoi model but an active vertex model. A Voronoi model is only used for initialization. Both models are certainly related but not identical and claims for a Voronoi model do not need to hold for an active vertex model. The suggested comparison for the multi phasefield model is not an easy task as it requires to compute the vertices from the phase field variables. There are gaps between cells and a reliable algorithm to identify the vertices is a task on its own. We, therefore, refrain from doing these calculations. Instead, we have used the experimental data from Armengol-Collado et al. (2023) for which the polygonal information are provided, see Figure 11. Especially for p \= 6, strong differences can be seen by comparing the PDF obtained by the full shape and the polygonal shape. Indeed, the strong hexatic order at the cellular scale is only a consequence of the approximation by polygons. With this result analysing the multi phasefield data by γ<sub>p</sub> does not add any new information as this first requires an approximation by polygons.

      The authors show the q<sub>p</sub> distributions for the experimental systems (Figure 2, Figure 11). For completeness, I would like to ask the authors to also coarse-grain q<sub>p</sub> and γ<sub>p</sub> of the experimental data as shown for the computational models in Appendix 2 - Figure 13 and Appendix 2 - Figure 14. It would be interesting to see if the hexatic-nematic crossover appears. I would recommend that the authors avoid using the Voronoi tailing of the experimental system, as this may fail to obtain the crossover as explained in (5) above. Instead, I suggest using the real vertex positions for γ<sub>p</sub>, which can be obtained from the segmented images.

      It remains open what is meant by ”the real vertex positions for γ<sub>p</sub>, which can be obtained from the segmented images”. Segmenting the images leads to smooth contours, partly even with gaps between cells. As the magnitude of γ<sub>p</sub> depends on the number of points used in the calculation it is not meaningful to use all points of the contour for calculating γ<sub>p</sub>, as this would lead to artificially low values for |γ<sub>p</sub>|. Identifying the vertex positions for an approximating polygon is an issue of its own and the consequence of this approximation is already mentioned above. For a comparison we therefore added the experimental data from Armengol-Collado et al (2023) and used the provided vertex positions to compute q<sub>p</sub> and γ<sub>p</sub> as well as the raw data and performed the segmentation and used these data to compute q<sub>p</sub>. See Figure 11. These results confirm our findings and show that the proposed nematic-hexatic phase transition is specific to γ<sub>p</sub> to characterize shape.

      In order to show that shape descriptors like the shape function, γ<sub>p</sub>, introduced by Armengol-Collado et al., ’fail to capture the nuance of irregular shapes’ (line 445), the authors have compared γ<sub>p</sub> with the Minkowski tensors, q<sub>p</sub>, using the same dataset (Figure 1 with Appendix 3 - Figure 16, Figure 2 with Appendix 3 - Figure 17, and Figure 4 with Appendix 3 - Figure 15 Appendix 3). I agree that γ<sub>p</sub> and q<sub>p</sub> are different, not showing identical values. However, I see no evidence in these figures that q<sub>p</sub> describes the symmetry of a cell better than γ<sub>p</sub>, since the values are similar and vary quite similarly between different p-atic orders. What is the quantitative difference that shows the failure of the shape function to capture the nuance of irregular shapes?

      The statement already follows from the mathematical properties of robustness and stability, which is illustrated in Fig. 6. The mentioned comparisons for simulation and experimental data only demonstrate that the lack of robustness and stability of γ<sub>p</sub> also leads to different results if applied to averages of cell measures. The differences are twofold, first the approximation of cells by polygons leads to different results, and second even for polygons different results follow, as only one approach is continuous and the other not. This has strong consequences for the proposed nematic-hexatic phase transition if coarse-grained. Our added results for the experimental data from Armengo-Collado et al. (2023) show that this behavior is not a physical feature but only specific to the use of γ<sub>p</sub>.

      The authors claim that the Minkowski tensors provide a ’reliable framework’ and that this framework ’opens new pathways for understanding the role of orientational symmetries in tissue mechanics and development’ (line 78-79). However, the p-atic orders in the experimental systems peak at very low orders of q<sub>p</sub> < 0.3, which may not allow conclusions about (non-)dominant orientational symmetry(ies) of cells. Can this framework be applied to experimental systems? Since the Minkowski tensors display the independence of the hexatic and nematic symmetry, the variations of cell shapes in experimental systems are too strong to provide any additional results (line 437), as stated by the authors, and no crossover was found, while the crossover was reported by Armengol-Collado et al., what new pathways can be opened to study tissues?

      We have added a comparison with experimental data from Armengol-Collado et al. (2023) and demonstrate that the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape. So our results first of all essentially close the ”pathway for understanding the role of orientational symmetries in tissue mechanics and development”, which was proposed on this nematic-hexatic transition. On the other side, even if q<sub>p</sub> peaks at relatively low values, the results demonstrate independence of the measures for different p’s, for two different modeling approaches and two different sets of experimental data. This motivates to consider p-atic order for different p simultaneously. Such theories of ”multi”-p-atic liquid crystals, as proposed in the conclusions, are the mentioned new pathways.

      In principle, the introduced Minkowski tensors integrate the orientation of the normal vectors (Equation 6) and consider the perimeter of the contour (Equation 1). Do the tensors distinguish between convex and concave curvature since both are present in tissues? Does a square with 4 concave and a square with 4 convex edges (same curvature) have the same q<sub>p</sub> values?

      For the specific situation of a square with 4 concave or 4 convex edges even p would lead to the same orientation and the same value for q<sub>p</sub>, as even p have a 180 degree symmetry. Odd p would result in the same value for q<sub>p</sub> but in a different orientation ϑ<sub>p</sub>. In more general cases, e.g. shapes with concave and convex edges, no general statements can be made. In general the theoretical results on stability of q<sub>p</sub> only hold for convex shapes. However, as discussed in Methods and materials the known counterexamples for concave shapes are not relevant for cell shapes.

      In lines 169-172 and Figure 6, the authors report a jump in γ<sub>p</sub>. Why has the fourth vertex in the last image been removed? The vertices are essential for the calculation of γ<sub>p</sub>. If the fourth vertex is not removed, the following values result: γ<sub>3</sub> = 0.935 and γ<sub>4</sub> = 0.474, which leads to changes of the same order of magnitude as those of q<sub>p</sub>. I think it is therefore not the choice of the center of mass that ’heavily influences the value of γ<sub>p</sub>’, but the removal of the fourth vertex.

      We adjusted the caption to make our point more clear. The last image is a triangle and according to the definition of γ<sub>p</sub> is therefore described by only three vertices. The reviewer is right that the value of γ<sub>p</sub> has a strong dependency of the number of used vertices, this is exactly the point that we are trying to make with this figure. An equilateral triangle should be recognized as an equilateral triangle, no matter if there is an artificial fourth vertex or not. The triangle in our picture and the triangle that the reviewer described (so our triangle with an artificial fourth vertex) both have the shape of an equilateral triangle, yet for one |γ<sub>3</sub>| = 1.0 and for the other one it is |γ<sub>3</sub>| = 0.935. This can be seen even more clearly if even more artificial vertices on the outline of the equilateral triangle are added, which will decrease |γ<sub>3</sub>| even more. Furthermore, we think there was a misunderstanding regarding our statement about the center of mass. The general problem of γ<sub>p</sub> - so the dependence of the values on the number of vertices - is independent of the calculation of the center of mass. The exact values of γ<sub>p</sub> on the other hand depend on the choice of this. We follow Armengol-Collado et al. (2023) and use the mean of all vertex coordinates as center of mass. If the reviewer would use the center of mass of the equilateral triangle and do the same calculations the resulting values for γ<sub>p</sub> would be different. This is what we meant with ’heavily influences the value of γ<sub>p</sub>’.

      In Appendix 3 - Figure 18, the authors show that the shape function, γ<sub>6</sub>, exhibits a non-monotonic trend as a function of activity and deformability. I have no objection to this statement. However, I would like to ask the authors to check the values for γ<sub>6</sub>. In the bottom-left corner, for example, γ<sub>6</sub> = 0.55. This value seems very low to me. In Appendix 3-Figure 20, |Q<sub>6</sub>| for R/Rcell = 2 is already in this range, while |Q<sub>6</sub>| for R/Rcell = 1 (not shown), corresponding to γ<sub>6</sub>, must be even higher. Also, the parameters p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1 should result in a nearly hexagonal lattice, which should be captured with high γ<sub>6</sub> values. I would expect γ<sub>6</sub> to be in the same range as q<sub>6</sub>.

      Many thanks for pointing this out. There are two different points addressed in this question: The first is if |Γ<sub>p</sub>| is too high. We checked the values, |Γ<sub>p</sub>| = 0.5075 for R/R<sub>cell</sub> = 2, so it is lower than = 0.58. The second question is why γ<sub>p</sub> and q<sub>p</sub> are not in the same value range. You are right that for a perfectly hexagonal lattice both should give the same value, namely = = 1.0. However, even at p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1 this is not a perfectly hexagonal lattice anymore and how fast the values of q<sub>6</sub> and |γ<sub>6</sub>| drop if we move away from a perfect hexagon scales differently. As q<sub>p</sub> is stable and only changes slightly for slight changes in the shape it makes sense, that q<sub>p</sub> is still close to 1.0 . We included an image, see below, of one time step in said parameter to showcase that cells do not form a perfect hexagonal lattice anymore.

      Reviewer 3:

      Could the authors show why and how this method could bring new information which were missing so far in the understanding of morphogenesis in vitro and in vivo with the current quantification?

      The introduction provides examples of how orientational order and its topological defects can be linked to morphological changes in tissues. The orientational order emerges from the shape of the cells. Most commonly nematic order has been considered, but more recently also hexatic order and even a nematic-hexactic crossover on larger scales. This suggests a mechanical mechanism for morphogenesis, like a phase transition from hexatic to nematic, which would have consequences on the evolution of shape. We demonstrate that the measures q<sub>2</sub> and q<sub>6</sub> are independent. Furthermore the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. These measures are not robust and therefore should not be used. Results for the robust measures q<sub>p</sub> suggest to consider all p for a coarse-grained theory to model morphological changes in tissues.

      Could authors show quantitative comparisons between available methods with the same sets of data and highlight pros and cons?

      Author response image 1.

      Screenshot from p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1

      In addition to what was already done for the simulation data we have added data from Armengol-Collado et al. (2023) and compared the results for q<sub>p</sub> and Q<sub>p<sub> and γ<sub>p</sub> and Γ<sub>p</sub>. The theoretical results and the illustrating example in Fig. 6 already show that there are no pros for γ<sub>p</sub>. Other methods belong to the class of bond-order methods and measure neighbor relations instead of shape. We already comment that these methods are inappropriate to classify shape, see Methods and materials, last sentence and Mickel et al. (2013) for a detailed discussion why these methods are not robust.

      Instead of using phase contrast images, which exhibit curved cell-cell contours, could authors use data with E-cadherin staining instead - as used in many epithelial studies in vitro and in vivo? Could they show both images for wild type and for the E-cadherin KO cell lines with fluorescent readout?

      We are convinced that our results do not depend on the way to visualize the cell contours. Furthermore the images do not provide additional information. To further strengthen the experimental part of the manuscript, we instead analyzed data from Armengol-Collado et al. (2023).

      They confirm our findings.

      The authors acknowledge differences in density between cell lines p. 13 so this calls for new experiments with solid readouts and analysis using comparable experimental conditions.

      Additionally, we analyzed data from Armengol-Collado et al. (2023) which confirm our findings. Our results are now supported by two different modeling approaches and two different experimental settings. Because of redundancy we removed the original experimental data from the revised manuscript.

    1. eLife Assessment

      The study provides valuable technical advances to generate and isolate neural rosettes. The technique is robust, as indicated by both reviewers. The evidence is solid, as shown in orthogonal characterization by flow cytometry, morphology, and scRNA-seq. Comparison with the manual-rosette-picking protocol will enhance the validity of the claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to develop a fully scalable, feeder-free protocol for deriving dorsal forebrain neural rosette stem cells (NRSCs) from human pluripotent stem cells, eliminating the need for manual rosette isolation. Using dynamic suspension culture combined with single-SMAD inhibition (RepSox), they sought to generate FOXG1⁺/OTX2⁺ NRSCs within ten days and expand them through at least twelve passages while retaining regional identity. They also aimed to demonstrate the cells' capacity to differentiate into functional neurons, astrocytes, and oligodendrocytes under defined conditions.

      Strengths:

      A key strength is the elimination of labour-intensive manual rosette picking, which significantly reduces operator variability and enhances throughput. The authors provide diverse validation in the form of flow cytometry showing >95% OTX2⁺ over passages 2-12, immunocytochemistry, single-cell RNA-seq, and functional MEA recordings, confirming both regional fidelity and neuronal activity. They also demonstrate glial differentiation and reproducibility across two hESC lines.

      The results convincingly demonstrate that the RepSox/suspension approach yields high-purity dorsal forebrain neural progenitor cells (NRSCs) that maintain marker expression and multipotency through passage 12 and differentiate into electrophysiologically active neurons and mature glia. Thus, the authors have achieved their primary objectives.

      This protocol addresses a significant bottleneck in neural stem cell production by providing a reproducible, high-throughput alternative that is well-suited to drug screening, disease modelling, and potential cell therapy manufacturing. Standardised, scalable NRSC banks will accelerate neurodevelopmental and neurodegenerative disorder studies, enable automated bioreactor workflows, and encourage the sharing of resources across academia and industry.

      Weaknesses:

      Weaknesses include a lack of direct comparison to conventional manual-selection protocols, and the need to improve the statistical rigor of all quantitative assays by applying appropriate hypothesis tests (e.g., t-tests or ANOVA with multiple-comparison correction) rather than reporting mean {plus minus} SD alone.

      Additional Context:

      Beyond the core technical advance, it's important to situate this work within the broader landscape of neural stem cell research and its downstream applications. Traditionally, dorsal forebrain NSCs have been generated via manual rosette picking after dual-SMAD inhibition (Chambers et al., 2009), a process that is labor-intensive, low-throughput, and prone to operator-dependent variability. By eliminating that step, this protocol directly addresses a key barrier to standardizing NSC production under GMP-compatible conditions - critical for both large-scale drug screening and eventual clinical use. Stable, regionally specified forebrain NSCs are especially valuable for modeling early neurodevelopmental disorders (e.g., autism spectrum disorders, microcephaly) and late-onset pathologies (e.g., Alzheimer's disease) in vitro, where precise cortical patterning is essential to recapitulate disease phenotypes. Moreover, establishing long-term epigenetic fidelity (e.g., via future ATAC-seq or histone-mark profiling) will further reassure users that transcriptional consistency reflects preserved regulatory networks, not just transient marker expression. Finally, demonstrating robust cryopreservation viability (>80%) makes these cells a readily shareable resource for the community, accelerating cross-lab reproducibility and comparative studies of patient-derived iPSC lines. This context underscores how scalable, high-purity forebrain NSCs can transform both basic neuroscience research and translational pipelines.

    3. Reviewer #2 (Public review):

      In the present manuscript, Dannulat Frazier et al. provide a novel and advanced protocol for obtaining almost pure populations of neural rosette stem cells (NRSCs) expressing the general markers NES and SOX2. These NSCs are expandable and exhibit dorsal forebrain properties and markers that are maintained throughout passages in culture (at least until passage 12). The authors also demonstrate the multipotency of these NSCs by their ability to differentiate into functional neurons, and precursors of astrocytes and oligodendrocytes.

      This method does not require the usual step of manual rosette selection and allows a greater homogeneity of the NSCs obtained and the standardization of the protocol, which will allow greater advances in the applications of these NSCs in research and as models of disease or compound testing. The manuscript is of great interest for the research area, since it describes a new methodology that can facilitate the research and therapeutic application of NSCs.

      The manuscript is well-written; the results are clear, robust, and well-explained. The conclusions reached in this paper are well-supported by the data, but some aspects could be better clarified.

      (1) The results presented in the present manuscript of the NSCS are performed up to passage 12; it would be interesting to know up to which passages these cells can be expanded, maintaining their initial properties. Have the authors analyzed passages beyond 12?

      (2) In Figure 2A, where different markers are shown in NSCs at different passages, it seems that at passage 12, there is a decrease in TJP1+ zones in relation to earlier passages, which could indicate a reduction in the potential to generate rosettes. Have the authors done any quantification along these lines? Could this be the case, or is it just an effect of the image chosen?

      (3) In Figure 3A, it is very striking and intriguing that the decrease in the expression of the PAX6 gene in passage 8 in relation to passage 2, which does not correspond to what is observed at the protein level. Have the authors verified this result using another technique, such as for example RT-q-PCR?

      (4) In Figure 5B, the labeling for GFAP, appears rather nuclear, despite being a cytoskeleton protein. How can the authors explain this?

    4. Author response:

      Reviewer #1 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your insights regarding our work, as they are invaluable in refining our research.

      We are very happy to hear that you recognize the strengths of our method, particularly the elimination of manual rosette picking, which significantly enhances throughput and reduces variability. We are also pleased that our validation efforts—through flow cytometry, immunocytochemistry, single-cell RNA-sequencing, and functional MEA recordings—effectively demonstrate both the identity and functionality of our derived dorsal forebrain neural rosette stem cells (NRSCs).

      Regarding the identified weaknesses, we agree that a direct comparison with conventional manual-selection protocols, specifically those utilizing dual-SMAD inhibition, would be a significant improvement. To address this, we have initiated additional experiments that will directly compare our single-SMAD inhibition approach (RepSox) with dual-SMAD inhibition (SB/LDN), aiming for a comprehensive evaluation of both protocols.

      In terms of statistical rigor, we appreciate your suggestion on improving our quantitative assays. All data were collected from at least three independent experiments and presented as mean ±standard deviation unless otherwise specified. Due to the qualitative nature of the data, no formal statistical tests were performed for most of the experiments and the mean and standard deviation were calculated for some quantitative measurements obtained, providing a descriptive summary of the data. When possible, we will incorporate appropriate statistical tests, to present our data in a more robust manner, rather than merely reporting mean ± SD.

      Finally, we recognize the importance of situating our work within the broader landscape of neural stem cell research. We aim to elucidate the potential downstream applications for our protocol, which we believe will significantly impact neurodevelopmental and neurodegenerative disorder studies.

      Thank you again for your valuable suggestions. We look forward to refining our manuscript and enhancing the contribution of our research to the field.

      Reviewer #2 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate your recognition of the novelty and potential impact of our protocol for obtaining neural rosette stem cells (NRSCs). Your comments are invaluable in improving our work.

      We are pleased that you found our methodology to be a significant advancement in the field, particularly the elimination of the manual rosette selection step, which hopefully will enhance homogeneity and standardization. We agree that this development has implications for research, disease modelling, and compound testing.

      Regarding your specific points:

      Passage expansion: Thank you for your insightful suggestion regarding the analysis beyond passage 12. We have continued passaging our NRSC line for more than 12 passages while maintaining the rosette structure. Although we do not yet have comprehensive and detailed analyses at these later passages, we will include some data and relevant information on our findings in the revised manuscript.

      TJP1+ zones: We appreciate your observation regarding the decreased TJP1+ zones at passage 12. We have not consistently detected a reduction in the number of rosettes or TJP1+ lumens across our cultures between passages. While some variability has been noted, we occasionally observe minor reductions at specific time points, followed by a recovery of rosettes in subsequent passages. This suggests that monitoring the number of rosettes is indeed a useful indicator of cell culture health. Cultures should be discarded if rosettes are completely lost. We will take a closer look at this aspect and report the findings in the revised manuscript.

      PAX6 Gene expression verification: Thank you for highlighting the discrepancy between PAX6 gene expression levels and protein levels. Unfortunately, we have not yet validated these results using an alternative technique. One potential explanation for this discrepancy may be the phenomenon of negative autoregulation, where increased levels of PAX6 protein can inhibit its own mRNA expression (Manuel et al., 2007). Moreover, Hsieh and Yang (2009) observed that during neurogenesis, PAX6 protein levels may not correlate linearly with mRNA levels, particularly in variable cellular environments. Additionally, post-transcriptional regulatory mechanisms, such as translation initiation mediated by Internal Ribosome Entry Sites (IRES), have been documented in various contexts involving PAX6, suggesting that mRNA levels may not fully represent functional protein levels in developing tissues (Li et al., 2023). We will go deeper into this discussion in the revised manuscript.

      GFAP Labeling: We appreciate your comments regarding the nuclear labeling of GFAP. In our astrocyte cultures, we have indeed observed GFAP localization in both the nucleus and the cytoplasm (Figure 5B). We will investigate this phenomenon further and provide a clearer explanation, supported by relevant literature, in the revised version. Although GFAP is primarily categorized as an intermediate filament protein localized in the cytoplasm, evidence suggests its nuclear localization may indicate additional regulatory roles during astrocyte development, activation, and pathology. This finding highlights the potential complexity of GFAP's role during fetal development and cellular stress, suggesting a broader functional scope that may extend into the nuclear space.

      Once again, thank you for your insightful feedback and for recognizing the potential of our research. We are committed to addressing your comments and enhancing the quality of our manuscript.

      Manuel, M. et al. (2007) ‘Controlled overexpression of Pax6 in vivo negatively autoregulates the Pax6 locus, causing cell-autonomous defects of late cortical progenitor proliferation with little effect on cortical arealization’, Development, 134(3), pp. 545–555. Available at: https://doi.org/10.1242/dev.02764.

      Hsieh, Y.-W. and Yang, X.-J. (2009) ‘Dynamic Pax6 expression during the neurogenic cell cycle influences proliferation and cell fate choices of retinal progenitors’, Neural Development, 4(1), p. 32. Available at: https://doi.org/10.1186/1749-8104-4-32.

      Li, Q. et al. (2023) ‘Translation of paired box 6 (PAX6) mRNA is IRES-mediated and inhibited by cymarin in breast cancer cells’, Genes & Genetic Systems, 98(4), pp. 161–169. Available at: https://doi.org/10.1266/ggs.23-00039.

    1. eLife Assessment

      This study seeks to determine how synaptic relationships between principal cell types in the olfactory system vary with glomerulus selectivity and is therefore valuable to the field. The methodology is solid, and with the caveat that here was a technical need to group all local interneurons, centrifugal neurons and multiglomerular projection neurons into one category ("multiglomerular neurons"), this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

    2. Reviewer #1 (Public review):

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli - one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomerluli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned gomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli.

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I had earlier suggested comparisons with other EM datasets that exist to investigate stereotypy, and am convinced by their efforts and reasons for which these were either not possible to do or not possible within the timeframe of a revision.

      Comments on revisions:

      Thank you for the careful responses to my suggestions. I hope that such approaches will be possible by others going forward.

    3. Reviewer #2 (Public review):

      The chemoreceptor proteins expressed by olfactory sensory neuron differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus thus defining the bounds of their restricted regions of interest. Using the approach, the authors identify several architectural motifs that differ between glomeruli with different tuning properties

      In the revised version of this study the authors discuss several important limitations. There was a technical need to group all local interneurons, centrifugal neurons and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates interpretations as even multiglomerular projection neurons are very diverse. With only 2 narrowly tuned glomeruli and 1 broadly tuned glomerulus, architecture differences may reflect more than just differences in tuning breadth. Finally, the degree to which inter-animal variability may contribute to differences between glomeruli is discussed. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      Comments on revisions:

      I appreciate the thoughtful responses that the authors made regarding the initial assessment of their study. The authors discuss these limitations in their manuscript which should not be viewed as criticisms, but rather caveats to be considered for this study specifically and in some instances, for all connectomics studies.

      I still believe there is a lost opportunity to make use of the FlyWire dataset to make specific strategic comparisons. I do not propose attempting to replicate the comprehensive nature of the main study, but querying cell type based on glomerular innervation would allow the authors to address consistency of observed differences between glomeruli as ORNs and uPNs have been thoroughly annotated and analysis can be limited by neuropil. I agree that it is unclear how many individuals would need to be examined to achieve sufficient statistical power, but some of the circuit motifs revealed in this study can be readily tested in the FlyWire dataset. For instance, the observation from this study that narrowly tuned ORNs receive less synaptic input from LNs is supported in FlyWire, with DL5 ORNs getting far more synaptic input from LNs relative to DA2 and VA1v. I'm not proposing repeating all of the analyses from this study, and there is no doubt that inter-animal variability and technical differences can explain different observations across datasets, but I believe these are considerations of which the readers (who can query these synaptic relationships in FlyWire) should be made aware.

    1. eLife Assessment

      The authors collected valuable time-course RNA-seq data from four tree species in natural environments and analyzed seasonal patterns of gene expression. The genome assemblies and gene expression data across multiple species and tissues are convincing, but the overarching conclusions are inadequately supported due to weaknesses in the study design, which encompasses three different environments and two distinct time periods. This makes it impossible to disentangle genetic effects - which are critical for evolutionary inferences - from environmental influences on gene expression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Weaknesses:

      The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      (a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      (b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Comment; The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      We sincerely thank the reviewer for the detailed and thoughtful comments. We fully recognize the importance of carefully distinguishing genetic and environmental contributions in transcriptomic studies, particularly when addressing evolutionary questions. The reviewer identified two major concerns regarding our study design: (1) the use of different monitoring periods across species, and (2) the use of samples collected from different study sites. We addressed both concerns with additional analyses using 112 new samples and now present new evidence that supports the robustness of our conclusions.

      (1) Monitoring period variation does not bias our conclusions

      To address concerns about the differing monitoring periods, we added new RNA-seq data (42 samples each for bud and leaf samples for L. glaber and 14 samples each for bud and leaf samples for L. edulis) collected from November 2021 to November 2022, enabling direct comparison across species within a consistent timeframe. Hierarchical clustering of this expanded dataset (Fig. S6) yielded results consistent with our original findings: winter-collected samples cluster together regardless of species identity. This strongly supports our conclusion that the seasonal synchrony observed in winter is not an artifact of the monitoring period and demonstrates the robustness of our conclusions across datasets.

      (2) Site variation is limited and does not confound our findings

      Although the study included three sites, two of them (Imajuku and Ito Campus) are only 7.3 km apart, share nearly identical temperature profiles (see Fig. S2), and are located at the edge of similar evergreen broadleaf forests. Only Q. acuta was sampled from a higher-altitude, cooler site. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      (3) Justification for our approach in natural systems

      We agree with the reviewer that experimental approaches such as common gardens, reciprocal transplants, and the use of co-occurring species are valuable for disentangling genetic and environmental effects. In fact, we have previously implemented such designs in studies using the perennial herb Arabidopsis halleri (Komoto et al., 2022, https://doi.org/10.1111/pce.14716) and clonal Someiyoshino cherry trees (Miyawaki-Kuwakado et al., 2024, https://doi.org/10.1002/ppp3.10548) to examine environmental effects on gene expression. However, extending these approaches to long-lived tree species in diverse natural ecosystems poses significant logistical and biological challenges. In this study, we addressed this limitation by including three co-occurring species at the same site, which allowed us to evaluate interspecific differences under comparable environmental conditions. Importantly, even when we limited our analyses to these co-occurring species, the results remained consistent, indicating that the observed variation in transcriptomic profiles cannot be attributed to environmental factors alone and likely reflects underlying genetic influences.

      Accordingly, we added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the manuscript to clarify the limitations and strengths of our design, to tone down the evolutionary claims where appropriate, and to more explicitly define the scope of our conclusions in light of the data. We hope that these efforts sufficiently address the reviewer’s concerns and strengthen the manuscript.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      We thank the reviewer 1 for the helpful comment. To evaluate the variation among biological replicates, we compared the expression level of each gene across different individuals. We observed high correlation between each pair of individuals (Q. glauca (n=3): an average correlation coefficient r = 0.947; Q. acuta (n=3): r = 0.948; L. glaber (n=3): r = 0.948)). This result suggests that the seasonal gene expression pattern is highly synchronized across individuals within the same species. We mentioned this point in the Result section in the revised manuscript. We also calculated the mean mapping rates for each species. As the reviewer expected, the mapping rate was slightly lower in Q. acuta (88.6 ± 2.3%) and L. glaber (84.3 ± 5.4%), whose RNA-Seq data were mapped to reference genomes of related but different species, compared to that in Q. glauca (92.6 ± 2.2%) and L. edulis (89.3 ± 2.7%). However, we minimized the impact of these differences on downstream analysis. These details have been included in the revised main text.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      We thank the reviewer for the helpful comment. As noted, we acknowledge that hierarchical clustering (Fig. 2A) is primarily a visualization and hypothesis-generating method. To assess the biological relevance of the clusters identified, we conducted a Mann-Whitney U test or the Steel-Dwass test to evaluate whether the environmental temperatures at the time of sample collection differed significantly among the clusters. This analysis (Fig. 2B) revealed statistically significant differences in temperature in the cluster B3 (p < 0.01), indicating that the gene expression clusters are associated with seasonal thermal variation. These results support the interpretation that the clusters reflect coordinated transcriptional responses to environmental temperature. We revised the Results section to clarify this point.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      We agree that genome assemblies of Fagaceae species are becoming increasing available. However, our study does not aim to emphasize the novelty of the genome assemblies per se. Rather, with the increasing availability of chromosome-level genomes, we regard genome assembly as a necessary foundation for more advanced analyses. The main objective of our study is to investigate how each gene is expressed in response to seasonal environmental changes, and to link genome information with seasonal transcriptomic dynamics. To address the reviewer’s comment in line with this objective, we added a discussion on the syntenic structure of eight genome assemblies spanning four genera within the Fagaceae, including a species from the genus Fagus (Ikezaki et al. 2025, https://doi.org/10.1101/2025.07.31.667835). This addition helps to position our work more clearly within the context of existing genomic resources.

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      A previous study on genome size variation in Fagaceae suggested that, given the consistent ploidy level across the family, genome expansion likely occurred through relatively small segmental duplications rather than whole-genome duplications. Because Figure 1B-D supports this view, we cited the following reference in the revised version of the manuscript.

      Chen et al. (2014)  https://doi.org/10.1007/s11295-014-0736-y

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

      We relocated the explanation regarding the expression levels of single-copy and multi-copy genes to the section titled “Seasonal gene expression dynamics.” Additionally, we clarified in the Materials and Methods section that short-read sequencing data were used for both genome size estimation and phylogenetic reconstruction.

      Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      We are grateful for the reviewer for the detailed and thoughtful review of our manuscript.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      Leaf samples clustered into the bud are newly flushed leaves collected in April for Q. glauca, May for Q. acuta, May and June for L. edulis, and August and September for L. glaber. To clarify this point, we highlighted these newly flushed leaf samples as asterisk in the revised figure (Fig. 2A).

      comment; (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      We thank the reviewer for this insightful comment. As noted in the Discussion section, we hypothesize that such genome-wide seasonal expression patterns—and their divergence across species—are likely mediated by cis-regulatory elements and chromatin-level mechanisms. While a direct investigation of regulatory architecture was beyond the scope of the present study, we fully agree that incorporating regulatory genomic and epigenomic data would significantly deepen the mechanistic understanding of expression divergence. In this regard, we are currently working to identify putative cis-regulatory elements in non-coding regions and are collecting epigenetic data from the same tree species using ChIP-seq. We believe the current study provide a foundation for these future investigations into the regulatory basis of seasonal transcriptome variation. We made a minor revision to the Discussion to note that an important future direction is to investigate the evolution of non-coding sequences that regulate gene expression in response to seasonal environmental changes.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      We would like to note that phenological traits have been observed in the field on a monthly basis throughout the sampling period and the phenological data were plotted together with molecular phenology (e.g. Fig. 2A, C; Fig. 3C, D). Although the temporal resolution is limited, these observations captured species-specific differences in key phenological events such as leaf flushing and flowering times. We revised the manuscript to clarify this point.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      We fully agree with the reviewer that local environmental conditions, including microclimate and photoperiod differences, could potentially influence gene expression patterns. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were qualitatively similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      We believe these additional analyses help to decouple the effects of environment and genetics, and support our conclusion that both seasonal synchrony and phylogenetic constraints play key roles in shaping transcriptome dynamics. We added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the text accordingly to clarify this point and to acknowledge the potential impact of site-specific environmental variation.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      (a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      We thank the reviewer for the insightful comment. We would like to clarify that the analysis presented in Figures 5E and 5F was based on linear regression, not Pearson’s correlation. Although Δφ is a discrete variable, it takes values from 0 to 6 in 0.5 increments, resulting in 13 levels. We treated it as a quasi-continuous variable for the purposes of linear regression analysis. This approach is commonly adopted in practice when a discrete variable has sufficient resolution and ordering to approximate continuity. To enhance clarity, we revised the manuscript to explicitly state that linear regression was used, and we now reported the regression coefficient and associated p-value to support the interpretation of the observed trend.

      (b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

      We agree with the reviewer’s comment. While we retained the original panels, we reframed our interpretation to emphasize that, despite statistical significance, the observed correlation is very weak—suggesting that coding region variation is unlikely to be the primary driver of seasonal gene expression patterns. Accordingly, we revised the “Relating seasonal gene expression divergence to sequence divergence” section in the Results, as well as the relevant part of the Discussion.

    1. eLife Assessment

      This important study introduces an advance in multi-animal tracking by reframing identity assignment as a self-supervised contrastive representation learning problem. It eliminates the need for segments of video where all animals are simultaneously visible and individually identifiable, and significantly improves tracking speed, accuracy, and robustness with respect to occlusion. This innovation has implications beyond animal tracking, potentially connecting with advances in behavioral analysis and computer vision. While the strength of support for these advances is solid overall, the presentation could be greatly improved for clarity and broader accessibility; in addition, incorporating more standard metrics in the multi-animal tracking literature would better benchmark the approach against other methods.

    2. Reviewer #1 (Public review):

      Summary:

      This is a strong paper that presents a clear advance in multi-animal tracking. The authors introduce an updated version of idtracker.ai that reframes identity assignment as a contrastive learning problem rather than a classification task requiring global fragments. This change leads to gains in speed and accuracy. The method eliminates a known bottleneck in the original system, and the benchmarking across species is comprehensive and well executed. I think the results are convincing and the work is significant.

      Strengths:

      The main strengths are the conceptual shift from classification to representation learning, the clear performance gains, and the fact that the new version is more robust. Removing the need for global fragments makes the software more flexible in practice, and the accuracy and speed improvements are well demonstrated. The software appears thoughtfully implemented, with GUI updates and integration with pose estimators.

      Weaknesses:

      I don't have any major criticisms, but I have identified a few points that should be addressed to improve the clarity and accuracy of the claims made in the paper.

      (1) The title begins with "New idtracker.ai," which may not age well and sounds more promotional than scientific. The strength of the work is the conceptual shift to contrastive representation learning, and it might be more helpful to emphasize that in the title rather than branding it as "new."

      (2) Several technical points regarding the comparison between TRex (a system evaluated in the paper) and idtracker.ai should be addressed to ensure the evaluation is fair and readers are fully informed.

      (2.1) Lines 158-160: The description of TRex as based on "Protocol 2 of idtracker.ai" overlooks several key additions in TRex, such as posture image normalization, tracklet subsampling, and the use of uniqueness feedback during training. These features are not acknowledged, and it's unclear whether TRex was properly configured - particularly regarding posture estimation, which appears to have been omitted but isn't discussed. Without knowing the actual parameters used to make comparisons, it's difficult to assess how the method was evaluated.

      (2.2) Lines 162-163: The paper implies that TRex gains speed by avoiding Protocol 3, but in practice, idtracker.ai also typically avoids using Protocol 3 due to its extremely long runtime. This part of the framing feels more like a rhetorical contrast than an informative one.

      (2.3) Lines 277-280: The contrastive loss function is written using the label l, but since it refers to a pair of images, it would be clearer and more precise to write it as l_{I,J}. This would help readers unfamiliar with contrastive learning understand the formulation more easily.

      (2.4) Lines 333-334: The manuscript states that TRex can fail to track certain videos, but this may be inaccurate depending on how the authors classify failures. TRex may return low uniqueness scores if training does not converge well, but this isn't equivalent to tracking failure. Moreover, the metric reported by TRex is uniqueness, not accuracy. Equating the two could mislead readers. If the authors did compare outputs to human-validated data, that should be stated more explicitly.

      (2.5) Lines 339-341: The evaluation approach defines a "successful run" and then sums the runtime across all attempts up to that point. If success is defined as simply producing any output, this may not reflect how experienced users actually interact with the software, where parameters are iteratively refined to improve quality.

      (2.6) Lines 344-346: The simulation process involves sampling tracking parameters 10,000 times and selecting the first "successful" run. If parameter tuning is randomized rather than informed by expert knowledge, this could skew the results in favor of tools that require fewer or simpler adjustments. TRex relies on more tunable behavior, such as longer fragments improving training time, which this approach may not capture.

      (2.7) Line 354 onward: TRex was evaluated using two varying parameters (threshold and track_max_speed), while idtracker.ai used only one (intensity_threshold). With a fixed number of samples, this asymmetry could bias results against TRex. In addition, users typically set these parameters based on domain knowledge rather than random exploration.

      (2.8) Figure 2-figure supplement 3: The memory usage comparison lacks detail. It's unclear whether RAM or VRAM was measured, whether shared or compressed memory was included, or how memory was sampled. Since both tools dynamically adjust to system resources, the relevance of this comparison is questionable without more technical detail.

      (3) While the authors cite several key papers on contrastive learning, they do not use the introduction or discussion to effectively situate their approach within related fields where similar strategies have been widely adopted. For example, contrastive embedding methods form the backbone of modern facial recognition and other image similarity systems, where the goal is to map images into a latent space that separates identities or classes through clustering. This connection would help emphasize the conceptual strength of the approach and align the work with well-established applications. Similarly, there is a growing literature on animal re-identification (ReID), which often involves learning identity-preserving representations across time or appearance changes. Referencing these bodies of work would help readers connect the proposed method with adjacent areas using similar ideas, and show that the authors are aware of and building on this wider context.

      (4) Some sections of the Results text (e.g., lines 48-74) read more like extended figure captions than part of the main narrative. They include detailed explanations of figure elements, sorting procedures, and video naming conventions that may be better placed in the actual figure captions or moved to supplementary notes. Streamlining this section in the main text would improve readability and help the central ideas stand out more clearly.

      Overall, though, this is a high-quality paper. The improvements to idtracker.ai are well justified and practically significant. Addressing the above comments will strengthen the work, particularly by clarifying the evaluation and comparisons.

    3. Reviewer #2 (Public review):

      This work introduces a new version of the state-of-the-art idtracker.ai software for tracking multiple unmarked animals. The authors aimed to solve a critical limitation of their previous software, which relied on the existence of "global fragments" (video segments where all animals are simultaneously visible) to train an identification classifier network, in addition to addressing concerns with runtime speed. To do this, the authors have both re-implemented the backend of their software in PyTorch (in addition to numerous other performance optimizations) as well as moving from a supervised classification framework to a self-supervised, contrastive representation learning approach that no longer requires global fragments to function. By defining positive training pairs as different images from the same fragment and negative pairs as images from any two co-existing fragments, the system cleverly takes advantage of partial (but high-confidence) tracklets to learn a powerful representation of animal identity without direct human supervision. Their formulation of contrastive learning is carefully thought out and comprises a series of empirically validated design choices that are both creative and technically sound. This methodological advance is significant and directly leads to the software's major strengths, including exceptional performance improvements in speed and accuracy and a newfound robustness to occlusion (even in severe cases where no global fragments can be detected). Benchmark comparisons show the new software is, on average, 44 times faster (up to 440 times faster on difficult videos) while also achieving higher accuracy across a range of species and group sizes. This new version of idtracker.ai is shown to consistently outperform the closely related TRex software (Walter & Couzin, 2021\), which, together with the engineering innovations and usability enhancements (e.g., outputs convenient for downstream pose estimation), positions this tool as an advancement on the state-of-the-art for multi-animal tracking, especially for collective behavior studies.

      Despite these advances, we note a number of weaknesses and limitations that are not well addressed in the present version of this paper:

      (1) The contrastive representation learning formulation

      Contrastive representation learning using deep neural networks has long been used for problems in the multi-object tracking domain, popularized through ReID approaches like DML (Yi et al., 2014\) and DeepReID (Li et al., 2014). More recently, contrastive learning has become more popular as an approach for scalable self-supervised representation learning for open-ended vision tasks, as exemplified by approaches like SimCLR (Chen et al., 2020), SimSiam (Chen et al., 2020\), and MAE (He et al., 2021\) and instantiated in foundation models for image embedding like DINOv2 (Oquab et al., 2023). Given their prevalence, it is useful to contrast the formulation of contrastive learning described here relative to these widely adopted approaches (and why this reviewer feels it is appropriate):

      (1.1) No rotations or other image augmentations are performed to generate positive examples. These are not necessary with this approach since the pairs are sampled from heuristically tracked fragments (which produces sufficient training data, though see weaknesses discussed below) and the crops are pre-aligned egocentrically (mitigating the need for rotational invariance).

      (1.2) There is no projection head in the architecture, like in SimCLR. Since classification/clustering is the only task that the system is intended to solve, the more general "nuisance" image features that this architectural detail normally affords are not necessary here.

      (1.3) There is no stop gradient operator like in BYOL (Grill et al., 2020\) or SimSiam. Since the heuristic tracking implicitly produces plenty of negative pairs from the fragments, there is no need to prevent representational collapse due to class asymmetry. Some care is still needed, but the authors address this well through a pair sampling strategy (discussed below).

      (1.4) Euclidean distance is used as the distance metric in the loss rather than cosine similarity as in most contrastive learning works. While cosine similarity coupled with L2-normalized unit hypersphere embeddings has proven to be a successful recipe to deal with the curse of dimensionality (with the added benefit of bounded distance limits), the authors address this through a cleverly constructed loss function that essentially allows direct control over the intra- and inter-cluster distance (D\_pos and D\_neg). This is a clever formulation that aligns well with the use of K-means for the downstream assignment step.

      No concerns here, just clarifications for readers who dig into the review. Referencing the above literature would enhance the presentation of the paper to align with the broader computer vision literature.

      (2) Network architecture for image feature extraction backbone

      As most of the computations that drive up processing time happen in the network backbone, the authors explored a variety of architectures to assess speed, accuracy, and memory requirements. They land on ResNet18 due to its empirically determined performance. While the experiments that support this choice are solid, the rationale behind the architecture selection is somewhat weak. The authors state that:

      "\[W\]e tested 23 networks from 8 different families of state-of-the-art convolutional neural network architectures, selected for their compatibility with consumer-grade GPUs and ability to handle small input images (20 × 20 to 100 × 100 pixels) typical in collective animal behavior videos."

      (2.1) Most modern architectures have variants that are compatible with consumer-grade GPUs. This is true of, for example, HRNet (Wang et al., 2019), ViT (Dosovitskiy et al., 2020), SwinT (Liu et al., 2021), or ConvNeXt (Liu et al., 2022), all of which report single GPU training and fast runtime speeds through lightweight configuration or subsequent variants, e.g., MobileViT (Mehta et al., 2021). The authors may consider revising that statement or providing additional support for that claim (e.g., empirical experiments) given that these have been reported to outperform ResNet18 across tasks.

      (2.2) The compatibility of different architectures with small image sizes is configurable. Most convolutional architectures can be readily adapted to work with smaller image sizes, including 20x20 crops. With their default configuration, they lose feature map resolution through repeated pooling and downsampling steps, but this can be readily mitigated by swapping out standard convolutions with dilated convolutions and/or by setting the stride of pooling layers to 1, preserving feature map resolution across blocks. While these are fairly straightforward modifications (and are even compatible with using pretrained weights), an even more trivial approach is to pad and/or resize the crops to the default image size, which is likely to improve accuracy at a possibly minimal memory and runtime cost. These techniques may even improve the performance with the architectures that the authors did test out.

      (2.3) The authors do not report whether the architecture experiments were done with pretrained or randomly initialized weights.

      (2.4) The authors do not report some details about their ResNet18 design, specifically whether a global pooling layer is used and whether the output fully connected layer has any activation function. Additionally, they do not report the version of ResNet18 employed here, namely, whether the BatchNorm and ReLU are applied after (v1) or before (v2) the conv layers in the residual path.

      (3) Pair sampling strategy

      The authors devised a clever approach for sampling positive and negative pairs that is tailored to the nature of the formulation. First, since the positive and negative labels are derived from the co-existence of pretracked fragments, selection has to be done at the level of fragments rather than individual images. This would not be the case if one of the newer approaches for contrastive learning were employed, but it serves as a strength here (assuming that fragment generation/first pass heuristic tracking is achievable and reliable in the dataset). Second, a clever weighted sampling scheme assigns sampling weights to the fragments that are designed to balance "exploration and exploitation". They weigh samples both by fragment length and by the loss associated with that fragment to bias towards different and more difficult examples.

      (3.1) The formulation described here resembles and uses elements of online hard example mining (Shrivastava et al., 2016), hard negative sampling (Robinson et al., 2020\), and curriculum learning more broadly. The authors may consider referencing this literature (particularly Robinson et al., 2020\) for inspiration and to inform the interpretation of the current empirical results on positive/negative balancing.

      (4) Speed and accuracy improvements

      The authors report considerable improvements in speed and accuracy of the new idTracker (v6) over the original idTracker (v4?) and TRex. It's a bit unclear, however, which of these are attributable to the engineering optimizations (v5?) versus the representation learning formulation.

      (4.1) Why is there an improvement in accuracy in idTracker v5 (L77-81)? This is described as a port to PyTorch and improvements largely related to the memory and data loading efficiency. This is particularly notable given that the progression went from 97.52% (v4; original) to 99.58% (v5; engineering enhancements) to 99.92% (v6; representation learning), i.e., most of the new improvement in accuracy owes to the "optimizations" which are not the central emphasis of the systematic evaluations reported in this paper.

      (4.2) What about the speed improvements? Relative to the original (v4), the authors report average speed-ups of 13.6x in v5 and 44x in v6. Presumably, the drastic speed-up in v6 comes from a lower Protocol 2 failure rate, but v6 is not evaluated in Figure 2 - figure supplement 2.

      (5) Robustness to occlusion

      A major innovation enabled by the contrastive representation learning approach is the ability to tolerate the absence of a global fragment (contiguous frames where all animals are visible) by requiring only co-existing pairs of fragments owing to the paired sampling formulation. While this removes a major limitation of the previous versions of idtracker.ai, its evaluation could be strengthened. The authors describe an ablation experiment where an arc of the arena is masked out to assess the accuracy under artificially difficult conditions. They find that the v6 works robustly up to significant proportions of occlusions, even when doing so eliminates global fragments.

      (5.1) The experiment setup needs to be more carefully described.<br /> What does the masking procedure entail? Are the pixels masked out in the original video or are detections removed after segmentation and first pass tracking is done?<br /> What happens at the boundary of the mask? (Partial segmentation masks would throw off the centroids, and doing it after original segmentation does not realistically model the conditions of entering an occlusion area.)<br /> Are fragments still linked for animals that enter and then exit the mask area?<br /> How is the evaluation done? Is it computed with or without the masked region detections?

      (5.2) The circular masking is perhaps not the most appropriate for the mouse data, which is collected in a rectangular arena.

      (5.3) The number of co-existing fragments, which seems to be the main determinant of performance that the authors derive from this experiment, should be reported for these experiments. In particular, a "number of co-existing fragments" vs accuracy plot would support the use of the 0.25(N-1) heuristic and would be especially informative for users seeking to optimize experimental and cage design. Additionally, the number of co-existing fragments can be artificially reduced in other ways other than a fixed occlusion, including random dropout, which would disambiguate it from potential allocentric positional confounds (particularly relevant in arenas where egocentric pose is correlated with allocentric position).

      (6) Robustness to imaging conditions

      The authors state that "the new idtracker.ai can work well with lower resolutions, blur and video compression, and with inhomogeneous light (Figure 2 - figure supplement 4)." (L156).

      Despite this claim, there are no speed or accuracy results reported for the artificially corrupted data, only examples of these image manipulations in the supplementary figure.

      (7) Robustness across longitudinal or multi-session experiments

      The authors reference idmatcher.ai as a compatible tool for this use case (matching identities across sessions or long-term monitoring across chunked videos), however, no performance data is presented to support its usage.

      This is relevant as the innovations described here may interact with this setting. While deep metric learning and contrastive learning for ReID were originally motivated by these types of problems (especially individuals leaving and entering the FOV), it is not clear that the current formulation is ideally suited for this use case. Namely, the design decisions described in point 1 of this review are at times at odds with the idea of learning generalizable representations owing to the feature extractor backbone (less scalable), low-dimensional embedding size (less representational capacity), and Euclidean distance metric without hypersphere embedding (possible sensitivity to drift).

      It's possible that data to support point 6 can mitigate these concerns through empirical results on variations in illumination, but a stronger experiment would be to artificially split up a longer video into shorter segments and evaluate how generalizable and stable the representations learned in one segment are across contiguous ("longitudinal") or discontiguous ("multi-session") segments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors propose a new version of idTracker.ai for animal tracking. Specifically, they apply contrastive learning to embed cropped images of animals into a feature space where clusters correspond to individual animal identities.

      Strengths:

      By doing this, the new software alleviates the requirement for so-called global fragments - segments of the video, in which all entities are visible/detected at the same time - which was necessary in the previous version of the method. In general, the new method reduces the tracking time compared to the previous versions, while also increasing the average accuracy of assigning the identity labels.

      Weaknesses:

      The general impression of the paper is that, in its current form, it is difficult to disentangle the old from the new method and understand the method in detail. The manuscript would benefit from a major reorganization and rewriting of its parts. There are also certain concerns about the accuracy metric and reducing the computational time.

    5. Author response:

      We thank the editor and reviewers for their positive and detailed review of the preprint. We will use these comments to improve the manuscript's revised version, which we plan to submit in the coming weeks, including: a) tests of variants of ResNet, other network architectures and the use of pre-trained weights, b) clarification and justification of the accuracy metrics used in the benchmark, c) an expanded study about the fragment connectivity in Figure 3, and d) a study the performance of idmatcher.ai with the new idtracker.ai.

    1. eLife Assessment

      This useful study presents interesting observations on the potential importance of extracellular transport of human papillomaviruses along actin protrusions by retrograde flow. The focus on the events of HPV infection between ECM binding and keratinocyte-specific receptor binding is unique and interesting. However, the evidence supporting the conclusions is incomplete, and additional experimental support is needed. Because conclusions drawn regarding HS interactions are largely based on experiments using a single HS mAb, the specificity of this mAb needs to be described in more detail, either based on the literature or further experimentation.

    2. Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided. The model should be fitted into established entry events, or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM.

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection. This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface. Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430-431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels. The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated. If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

    3. Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells. This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals. Also, there should be a higher n for the measurements.

    4. Author response:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      Please note that we state in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This is implying that we do not consider the shedding model as the one accepted model. HS may associate with PsVs despite of a decreased affinity and only after priming (see below the ‘priming model’) may translocate to the cell body.

      Furthermore, we do not state in the discussion either that the shedding model is the preferred one; although it is correct that we refer to the shedding model more extensively, simply because we find HS associated with transferred PsVs, which is in line with this model and requires its citation.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      As outlined above, our finding is compatible with both models, and we do not aim to verify the shedding model or disprove the priming model.

      It appears that the referee wishes more visibility of the priming model. Inhibition of KLK8 and furin should reduce the translocation to the cell body, no matter whether PsVs carry HS on their surface or not. For revision, we plan an experiment as in Figure 3 (CytD), testing whether either KLK8 or furin inhibition blocks the transfer to the cell body. Then, our data can be discussed also in the context of the priming model and by this increase its visibility.

      The model should be fitted into established entry events, or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection. Therefore, we do not view our data as being in conflict with the priming model. In fact, our observations are compatible with aspects of both the shedding and the priming model.

      Yet, we acknowledge that the study would gain importance by directly testing the priming model within our experimental system. As requested by the referee, we will discuss the above papers, and further plan to test KLK8 and furin inhibitors.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      We do not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated in the manuscript on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      Moreover, we ourselves detect these PsVs, for example, in Figure 5A (CytD, 0 min time point), a handful of PsVs localize to the cell body area. However, the large majority overlaps with the strong HS staining at the cell periphery, likely the ECM. An accurate quantification of the fractions of PsVs bound to the ECM/cell body is for the following reasons very difficult. First, the ECM PsVs are very dense and therefore not microscopically resolved into single PsVs, at least not completely (see Figure 1C; the high intensity spots are non-resolved PsVs, please see our discussion on line 148 - 152). For this reason, by just counting spots we strongly underestimate the ECM PsVs versus the cell body PsVs. Second, with the available immunostainings we cannot exactly delineate the ECM from the cell body. In particular, at the cell border region (for example see Figure 4B) we often observe PsV accumulations. Assigning these ´cell border region PsVs´ entirely to the cell body fraction, a preliminary analysis (correcting for the limitation of non-resolved ECM PsVs) suggests that about a quarter of the PsVs bind to the cell body. On the other hand, assigning them to the ECM, the cell body fraction would be much below 10%. Third, we observe that in regions devoid of ECM and cells PsVs apparently adhere unspecifically to the glass-coverslip. This suggests that some of the cell body PsVs are just unspecific background. Subtraction of a background PsV density from the ECM and cell body PsV density will reduce relatively more the cell body PsVs, and consequently decreases the fraction of cell body PsVs even more.

      Moreover, in the course of the project we wondered whether at the basolateral membrane there are not many binding sites anyway. To address this question, in an unpublished experiment, we detached HaCaT cells with trypsin, incubated them with PsVs, and then allowed reattachment to assess the binding in suspension. We detected minimal to no binding, which, however, could also result from apical membrane adherence to the coverslip or trypsin-mediated cleavage of HSPGs. As suggested by the reviewing editor, we agree that repeating this experiment using EDTA for detachment—thus preserving HSPGs—would offer more definitive insight into binding efficiency in the absence of accessibility constraints. In summary, the reason why in our cellular system most PsVs do not bind to the cell surface could be a combination of several factors:

      (1) The primary binding partners are more abundant in the ECM and the polarized HaCaT cells secrete more ECM when compared to other cultured cells used to study HPV infection. This promotes ECM binding.

      (2) In the polarized HaCaT cells, the apical membrane is largely devoid of syndecan-1, CD151 and Itga6, wherefore PsVs infect the cell via the basolateral membrane. However, the accessibility to the basolateral membrane is restricted, PsVs must diffuse through a narrow slit between the glass coverslip and the attached cell to reach HS on the cell surface. This limits cell surface binding.

      (3) If HaCaT cells secrete large amounts of ECM, the may become depleted from cell surface HS. As outlined above, we will try to find out how many PsVs bind to the basolateral membrane in the absence of restricted accessibility. If it turns out that HaCaT cells have not many binding sites anyway, this would additionally promote binding to the ECM.

      The outcome of the above issues, and how we will mention them in the revised version of the manuscript, is open. In any case, we would like to point out that PsVs bound to the cell body do not weaken our main conclusion. Still, we recognize that this point merits attention and plan several modifications of the manuscript. We did already, but now we will mention more explicitly that PsVs have been shown to directly interact with HSPG on the cell surface, in addition to the ECM, but that it also has been shown that the ECM strongly supports infection in NHEK and HFK (Bienkowska-Haba et al., Plos Path., 2018). The following is a draft version of a paragraph we plan to incorporate, explaining the above issue and why we used in our experiments HaCaT cells:

      ´In vitro, PsVs bind to both the cell surface and the ECM, as has been widely documented. In vivo, however, it has been proposed that initial binding occurs predominantly to the basement membrane ECM, rather than directly to the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010). This distinction reinforces the physiological relevance of ECM-bound particles in the early steps of HPV infection. Support for a functional role of ECM-mediated entry comes from a study showing that PsV binding to ECM derived from HaCaT cells significantly enhances infection of primary keratinocytes (Bienkowska-Haba et al., 2018). For these reasons, we specifically chose polarized HaCaT cells as a model system. These cells secrete abundant ECM from which the cells readily collect bound PsVs. On the other hand, the polarization limits the access of PsVs to basolateral receptors such as CD151 and Itgα6, and also cell body resident Syndecan-1, the most abundant HSPG in keratinocytes (Rapraeger et al., 1986; Hayashi et al., 1987; Kim et al., 1994). Hence, as polarization limits direct cell surface accessibility it biases binding toward the ECM, that in this culture system is abundant. Hence, in the HaCaT cell culture system, like probably in vivo, PsVs cannot circumvent binding to the ECM what they can do in unpolarized cell cultures that may not even secrete significant amounts of ECM. Altogether, this experimental situation closely mimics the in vivo situation where PsVs bind preferentially to the ECM (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We appreciate the reviewer’s input and believe these additions will strengthen the manuscript with regard to the relevance of the used cellular model system.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      We have shown all images at the same adjustments of brightness and contrast. As the staining at the periphery is stronger, the impression is given that the cell surface is not stained, although there is some staining. Specific staining is documented in Figure 5D, showing the PCC between PsVs and HS only of the cell body. If there was no HS staining, the PCC would be zero, which is not the case. Yet, it is lower when compared to the PCC measured at the cell border region, with more strongly stained HS.

      We will provide images at different contrast and brightness adjustments enabling the reader to see the staining on the cell surface. We will provide also more overview images to illustrate the strong variability of the HS staining between cells.

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430-431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The HS intensity transiently increases on the cell body (Fig. 5D) only after releasing a cohort of PsVs, which can be only explained by PsVs that carry HS from the ECM to the cell body. However, the effect is not significant. Using the antibody 3G10 detecting the HS neoepitope (see the referees’ suggestion below) we will reanalyze this point. This should help clarifying the issue.

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      The distribution of bound PsVs largely varies between cells. Some areas are covered with essentially confluent cells, to which hardly any PsVs are bound, because accessing the basolateral membrane of confluent cells is nearly impossible, and PsVs do not bind to the exposed apical membrane. This is different in cultures of unpolarized cells where we expect that PsVs distribute more equally over cells.

      This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect several hundred PsVs, much more than expected from the 50 vge/cell. We will point this out in the revised version.

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they just take shedded HS with them. Without PsVs, the shedded HS likely remains in the ECM or is washed out very slowly.

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      As outlined above, we plan to test the suggested antibody 3G10. We also plan to repeat the 0 min time point (with and without PsVs, with and without CytD) to find out whether in the PsV absence the HS intensity (at 0 min) is unchanged between control and CytD.

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to the cell body. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and will rephrase the respective section.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      We agree, and plan to test whether blebbistatin is equally efficient in blocking the transfer.

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      We agree that in multiple cell culture systems viruses bind preferentially to the cell directly. But we respectfully disagree with the assertion that the majority of PsVs bind to the cell body of HaCaT keratinocytes. As noted above (e.g., Figure 5A, CytD, 0 min), only a small fraction of PsVs localize to the cell body, whereas the vast majority overlap with intense HS staining at the cell periphery, consistent with ECM association, as the accessibility to the basolateral expressed HSPG is limited (see above). Based on quantitative estimation from multiple images, ECM-bound PsVs largely outnumber cell-bound particles (see above). These features make HaCaT cells a suitable in vitro model for mimicking in vivo conditions, where HPV has been proposed to bind predominantly to the basement membrane ECM rather than the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010) which also strongly enhances infection of primary keratinocytes in vitro (Bienkowska-Haba et al., 2018).

      Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, and the observed predominance of ECM binding supports the validity of our experimental focus.

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface (or maybe they are simply trapped between glass and basolateral membrane). PsVs are not expected to bind to the apical membrane that is depleted from CD151 and Itga6. In other cellular systems, cells may hardly secrete ECM, are not polarized, and do not adhere so tightly to the substrate. In other cultures, where virions can easily circumvent ECM binding, the large majority of PsVs will likely bind directly to the cell surface.

      As outlined above, in order to quantify PsVs that can bind without restricted accessibility, we plan to detach HaCaT cells by EDTA from the substrate, incubate them with PsVs, and let them adhere again (please see above).

      No matter what is the outcome, the fraction of PsVs that binds directly to the cell surface does not weaken our conclusion that we have identified a very fast and efficient transfer step from the ECM to the cell body.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      If blebbistatin works as expected, we can safely conclude that we observe the very same process as described in Scheelhas et al., PLoS Pathogens, 2008, showing that the PsVs migrate by retrograde transport to the cell surface and not that the cell spreads out and by this reaches the PsVs.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      Our plasma membrane stain does not stain the ECM. Please see Figure 1. The stain is actually used to distinguish the cell body from the ECM area.

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, based on flipped images, we generated a calibration curve used for the correction of random background. For details, please see Supplementary Figures 3 and 5.

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 5D shows the PCC specifically of the cell body. In flipped images (not shown in the manuscript for clarity, but can be added) we obtain a PCC of around zero.  For CytD, the flipped images always have a significantly lower PCC compared to the original images. In the control, the PCC of the flipped images are significantly lower only for the 30 min and 60 min time point. The non-significance of the 0 min and 180 min time point is due to low PCCs also in the original images.

      Also, there should be a higher n for the measurements.

      One n is the average of 15 cells. We realize that with n = 3 we find significant effects only if the effect is very strong or moderate with very low variance.

    1. eLife Assessment

      This valuable study provides outlines the mechanism by which repeated vaccination broadens the breadth of antibody responses against epitope unmatched virus strains. The authors' mathematical model is solid and incorporates various parameters that regulate B cell activation and antibody response.

    2. Reviewer #1 (Public Review):

      In this study, Deng et al. investigate the antibody response against HA antigen following repeated vaccination with the H1N1 2009 pandemic influenza vaccine strain, using in silico modeling. The proposed model provides valuable mechanistic insights into how the broadening of the antibody response takes place upon repeated vaccination.

      Overall, the authors' model effectively explains the mechanistic principles underlying antibody responses against the viral antigens harboring epitope immunodominancy.

    3. Reviewer #2 (Public Review):

      The authors have been studying the mechanism of breadth expansion in antibody responses with repeated vaccinations using their own mathematical model. In this study, they applied this mathematical model to a cohort data analyzing anti-HA antibody responses after multiple influenza virus vaccination and investigated the mechanism of antibody breadth expansion to diversified target viral strains.<br /> The manuscript is well written, and the mathematical model is well built that incorporates various parameters related to B cell activation in GC and EGC based on experimental data.

      Strengths:

      By carefully reanalyzing the published cohort data (Nunez IA et al 2017 PLoS One), they have clearly demonstrated that the repeated influenza virus vaccinations result in an expansion of the breadth to unmatched viral strains.

      Using their mathematical model, they have determined the major factors for the breadth expansion following multiple immunizations.

      Weaknesses:

      The overall concept of their model has already been published (Yang L et al 2023 Cell Reports) with a SRAS-CoV-2 vaccine model, and they have applied it to influenza virus vaccine in this study, with the conclusions being largely the same.

      It is unclear how the re-evaluation of public data in the first half part is related to the validation of their model in the later part.

      Other points:

      In the original data by Nurez LA et al., HAI (the inhibitory effect of anti-HA antibodies on the binding of HA to sialic acid on erythrocytes) was used as the lead-out. The authors conclude that the breadth expansion with repeated vaccinations is primarily due to the activation of B cells with BCRs that recognize minor common epitopes, induced by covering up of strain specific major epitopes by pre-existing antibodies. However, as they themselves show in Fig 1, once the sialic acid-binding region is covered, it seems difficult for another BCR to bind to this region. When the target epitope is limited like this, the effect of increasing antigen supply to DCs by pre-existing antibodies and the effect of increasing the presentation of minor epitopes appears to compete with each other. Could the author please explain this point? In relation to this point, please explain the meaning of analysis of the entire ectodomain when the original data's lead-out is HAI.

      Minor point:

      The description "The purpose of this model is ...." starting at line 171 and the description of "we obtain results in harmony with the clinical findings ...." starting at line 478 sound to be contradictory. As the authors themselves state at line 171, if the purpose of this model is not to fit the data but to demonstrate the principle, then the prudent sampling and reanalyzing data itself seems to have less meaning.

    4. Author response:

      Reviewer #1 (Public Review):

      In this study, Deng et al. investigate the antibody response against HA antigen following repeated vaccination with the H1N1 2009 pandemic influenza vaccine strain, using in silico modeling. The proposed model provides valuable mechanistic insights into how the broadening of the antibody response takes place upon repeated vaccination.

      Overall, the authors' model effectively explains the mechanistic principles underlying antibody responses against the viral antigens harboring epitope immunodominancy.

      We thank the Reviewer for their positive and thoughtful assessment of the work. We address issues raised in the revised manuscript and in the point-by-point responses below.

      Reviewer #2 (Public Review):

      The authors have been studying the mechanism of breadth expansion in antibody responses with repeated vaccinations using their own mathematical model. In this study, they applied this mathematical model to a cohort data analyzing anti-HA antibody responses after multiple influenza virus vaccination and investigated the mechanism of antibody breadth expansion to diversified target viral strains.

      The manuscript is well written, and the mathematical model is well built that incorporates various parameters related to B cell activation in GC and EGC based on experimental data.

      We thank the reviewer for their positive and thoughtful review and address issues raised in a revised version of the manuscript and in the point-by-point below.

      Strengths:

      By carefully reanalyzing the published cohort data (Nunez IA et al 2017 PLoS One), they have clearly demonstrated that the repeated influenza virus vaccinations result in an expansion of the breadth to unmatched viral strains.

      Using their mathematical model, they have determined the major factors for the breadth expansion following multiple immunizations.

      We thank the reviewer for pointing out the strengths of our study.

      Weaknesses

      The overall concept of their model has already been published (Yang L et al 2023 Cell Reports) with a SARS-CoV-2 vaccine model, and they have applied it to influenza virus vaccine in this study, with the conclusions being largely the same.

      It is unclear how the re-evaluation of public data in the first half part is related to the validation of their model in the later part.

      The reviewer is correct in that we build directly on our model published previously to study related phenomena for SARS-CoV-2. However, a critical advance of the work was to now ask whether antibody broadening following repeated homologous antigen exposure is a general feature of human humoral immunity. As we point out in the introduction of our manuscript, repeated exposure to the same antigen has long been assumed to predominantly boost strain limited humoral immunity, necessitating rational design of vaccines that re-orient antibody responses to target otherwise immune-subdominant targets. Hence, antibody broadening in response to homologous SARS-CoV-2 antigen points to reconsideration of that basic premise in immunology; and if we are to now define this as general feature of human antibody responses, then evaluation of the principle using a different vaccine protocol and antigen is necessitated. Accordingly, we took advantage of the influenza vaccine space where, within the immediate years following the 2009 H1N1 pandemic, the 2009 H1N1 strain was repeatedly applied as the seasonal vaccine strain. This HA was also novel (as it was from a pandemic virus pHA), meaning that traditional back-boosting to historical strains would be limited. We then re-evaluated the longitudinal HAI data of Nurez et al. to define whether a broadening to increasingly divergent vaccine-unmatched strains is observed upon repeated exposure to pHA. This was not done before and was enabled by incorporating our amino acid relatedness parameter and our structure-based definition of the RBS patch. To then query mechanistic origins of the broadening effect, we adapted and extended our previous computational model to: (1) better reflect HA epitope diversity and overlap within the RBS patch; and (2) to better reflect the influenza immunization regimens that are used clinically. The differences between the modeling done in this paper and that in Yang et al. 2023 are described in the Methods section separately. Taken together, our analyses of data in Nunez et al and our simulations strengthen the emerging view that repeated boosting with the same antigen enables the humoral immune system to diversify immune responses because of feedback regulation which leads to enhanced antigen on FDCs, persistent GCs, and epitope masking. This, in turn, enables the immune system to generalize to recognize and respond to unseen variant antigens that harbor mutations in the immunodominant epitopes. Our results point to a new and emerging paradigm regarding booster immunizations and fundamental features of the humoral immune system.

      Other points:

      In the original data by Nurez LA et al., HAI (the inhibitory effect of anti-HA antibodies on the binding of HA to sialic acid on erythrocytes) was used as the lead-out. The authors conclude that the breadth expansion with repeated vaccinations is primarily due to the activation of B cells with BCRs that recognize minor common epitopes, induced by covering up of strain specific major epitopes by pre-existing antibodies. However, as they themselves show in Fig 1, once the sialic acid-binding region is covered, it seems difficult for another BCR to bind to this region. When the target epitope is limited like this, the effect of increasing antigen supply to DCs by pre-existing antibodies and the effect of increasing the presentation of minor epitopes appears to compete with each other. Could the author please explain this point?

      We agree that accounting for epitope overlap is important when the target is limited, as the reviewer indicates. In Figure 6C vs 6D we assess steric effects of possible spatial overlap between dominant and subdominant epitopes. Under overlapping conditions, we find evidence for steric-based constrainment of broadening, as predicted by the reviewer. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots, as described by our results (see lines 392-406).

      We also now incorporate the following sentence into our discussion (lines 448-453):

      “Epitope masking will also be constrained by the dimensions of the RBS and our simulations do report attenuation of titers against historical influenza strains when we introduce epitope overlap. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots.”

      In relation to this point, please explain the meaning of analysis of the entire ectodomain when the original data's lead-out is HAI.

      We include side-by-side full length ectodomain versus RBS patch (sialic acid binding residues + antibody epitope ring) to demonstrate relatedness differences in the lead-out data. But it is precisely because of the point raised by the reviewer that we focus on using the RBS patch as the relatedness values to assess antibody broadening as defined by HAI activity (see Figure 3 and S2). 

      Minor point:

      The description "The purpose of this model is ...." starting at line 171 and the description of "we obtain results in harmony with the clinical findings ...." starting at line 478 sound to be contradictory. As the authors themselves state at line 171, if the purpose of this model is not to fit the data but to demonstrate the principle, then the prudent sampling and reanalyzing data itself seems to have less meaning.

      We respectfully disagree. Please see above point as to how the clinical data is more than just “reanalyzing” but to first discover the previously unreported broadening effect across highly divergent strains following sequential immunization with homologous antigen in the influenza vaccine space; we then extended and adapted our computational model for the influenza vaccination paradigm to gain mechanistic insight on how such antibody broadening may occur. The word “harmony” was not meant to imply quantitative agreement, and apologize if it caused confusion.

    1. eLife Assessment

      This useful paper examined the mechanism of planar cell polarity (PCP) using Drosophila pupal wing, investigating how 'cellular level', 'molecular level' and 'tissue level' mechanisms intersect to establish PCP. This represents a progress for the field, and the conclusions are mostly backed up by the solid data. Whereas the manuscript is sound overall, the reviewers found remaining concerns, which can mostly be addressed by textual clarification of the concepts used in the manuscript.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      I take issue with the author's conceptualization of what they are testing with the boundary experiment. In brief, they articulate the problem as testing the ability of a >>Fz boundary to repolarize cells that do not sense global signals and are therefore only controlled by cell-scale signaling (page 8: "Interfering with global cues also presents a challenge..."). I see this as a misconception of what their experiment really does.

      They then articulate three hypotheses for why propagation from the >>Fz boundary in the de novo polarization experiment is only a few cells. The first is that some global signal competes with the boundary. The second one suggests that local coupling of cell polarity gets established before it is challenged by the boundary effect ("expression of Fz in a neighbouring row of cells, has little effect on the polarity of neighbouring cells, as their polarity is already strongly coupled to their neighbours in the adjacent region of non-overexpression."). The third hypothesis is that the polarity within individual cells prevents more than a weak reorientation of their polarity in response to the boundary. I'll return to these ideas in a moment.

      Their conclusion is a little ambiguous: page 10 "Overall, our results show that it is hard to repolarise from a boundary of Fz overexpression in both control and de novo polarity conditions, consistent with de novo polarity being rapidly and robustly established and not easily perturbed. This provides evidence in favour of an effective cell-intrinsic polarisation mechanism." This reads as perhaps a mix of hypotheses two and three, though most of the other language in the manuscript suggests that they think it is the third one - that the intrinsic cell-scale signaling prevents much propagation: "This failure of polarization to propagate beyond 1-2 rows of cells is consistent with the presence of an effective cell-intrinsic polarization machinery, which is only weakly affected by cell-cell coupling of polarity to neighbours."

      As I see it, the conceptual flaw is that cell-scale signaling determines polarity magnitude, but has no ability to define orientation of polarization unless it gets some extrinsic information, either from a global signal or from coupling to neighbors. In this experiment, we can agree that they are not responding to any global signal as they repolarize, since the new images show they do not restore a proximal distal pattern across a large portion of the wing. But, importantly, if only cell-scale signaling alone is operating, then each cell should polarize in random orientation and independently of all neighbors; this is clearly not the case. And if some degree of orientation coupling is happening, then one should see random swirling patterns; this is also not the case, as the polarity pattern is highly stereotyped. Therefore, some unknown but reproducible signal or set of signals is defining the pattern in both the control and de novo conditions. We don't need to know the identity of those signals, but it is essential to recognize that something is doing this. One possibility is that the veins provide some signal - see Hogan et al 2011 - but as I said, the identity is not important. What the authors see is a stereotypical pattern with local correlation of orientations, indicating that both coupling and larger scale patterning information must exist.

      Now, let's think about what happens when de novo cell-scale signaling is initiated by Fz::mKate2-sfGFP expression. Both the boundary signal and the unknown patterning information are already present at the same time, so cells are polarizing in an environment in which the two patterning signals are competing with each other. All the cells in the anterior polarize, so cell-scale signaling works as expected. And the orientation of any given cell is determined by the competing inputs and the tendence to couple polarity to neighbors: coupling of cells in the first row or two orient their polarization in response to the boundary, while coupling of further cells to the unknown pattern prevents the reorientation beyond a few cells; in between, cells orient somewhere in between. In summary, what I think this experiment is telling us is the relative ability of the two competing patterning systems to spread information via coupling. The same competition is happening in the control condition, except that it starts earlier. I don't see anything in this experiment that says how cell-scale signaling makes propagation of orientation hard or not.

      If the authors wish to argue that it is cell-scale signaling that inhibits propagation from the boundary, I'd be curious to know how they explain some related observations from the literature (and I apologize for forgetting about one of these result in the last review - I know it's frustrating to have a reviewer come back with new critiques the second time around). Ma et al (2003) proposed something similar. They also wished to eliminate global signaling and look at local coupling. They similarly used >>Fz from an AP boundary (ptcGAL4 rather than hhGAL4), and to approximate removal of global input, induced large ft mutant clones. Whether you choose to believe Ft is providing a global signal or a competing parallel input doesn't matter for the purposes of this discussion - it only need be interpreted as a competing signal. Their result is quite different. Whereas Carayon and colleagues show no enhancement of the propagation in the "no-global" (de novo) condition compared to control, Ma et al showed a doubling or tripling of the distance of propagation. Furthermore, in their control condition in which global (and/or competing) signaling is intact, propagation goes at least 10 cells in the part of the wing they studied, and 20-30 cells when Ft is removed. From this experiment, would the authors still claim that propagation is "hard" due to cell scale signaling? Dramatically in the ft clone, propagation is vastly "easier" that in the Carayon condition. And why are these results so different? One difference is the time given to establish patterns. On the other hand, Carayon and colleagues show a plateauing of polarity magnitude at 8 hours, so it's not clear what would happen if the system could go much longer before pattern evolution is disrupted by other events around the time of hair growth. I'd argue that it all has to do with the strength of various inputs in the various locations in the wing used for these experiments.

      Even the domineering non-autonomy seen around fz and vang loss- or gain-of-function clones is often substantially more than one or two cells; that is in the context of intact global signaling (notwithstanding the authors correctly pointing out the tendency of adult hairs to coordinate polarity in the absence of a core PCP cell-scale signal). Would the authors say that cell-scale signaling is weaker there? I'd argue that the competing (in this case intact global) signal is weaker.

    3. Author response:

      (1) General Statements

      Our manuscript studies mechanisms of planar polarity establishment in vivo in the Drosophila pupal wing. Specifically we seek to understand mechanisms of ‘cell-scale signalling’ that is responsible for segregating core pathway planar polarity proteins to opposite cell edges. This is an understudied question, in part because it is difficult to address experimentally.

      We use conditional and restrictive expression tools to spatiotemporally manipulate core protein activity, combined with quantitative measurement of core protein distribution, polarity and stability. Our results provide evidence for a robust cell-scale signal, while arguing against mechanisms that depend on depletion of a limited pool of a core protein or polarised transport of core proteins on microtubules. Furthermore, we show that polarity propagation across a tissue is hard, highlighting the strong intrinsic capacity of individual cells to establish and maintain planar polarity.

      The original manuscript received three fair and thorough peer-reviews, which raised many important points. In response, we decided to embark on a full revision that attempts to answer all of the points. We have included new data to support our conclusions in Supplemental Figures 1, 2 and 5.

      Additionally in response to the reviewers we have revised the manuscript title, which is now ‘Characterisation of cell-scale signalling by the core planar polarity pathway during Drosophila wing development’.

      (2) Point-by-point description of the revisions

      We thank all of the reviewers for their thorough and thoughtful review of our manuscript. They raise many helpful points which have been extremely useful in assisting us to revise the manuscript.

      In response we have carried out a major revision of the manuscript, making numerous changes and additions to the text and also adding new experimental data. Specific changes are listed after our detailed response to each comment.

      Reviewer #1:

      Summary

      The authors use inducible Fz::mKate2-sfGFP to explore "cell-scale signaling" in PCP. They reach several conclusions. First, they conclude that cell-scale signaling does not depend on limiting pools of core components (other than Fz). Second, they conclude that cell-scale signaling does not depend on microtubule orientation, and third, they conclude that cell-scale signaling is strong relative to cell to cell coupling of polarity. 

      There are some interesting inferences that can be drawn from the manuscript, but there are also some significant challenges in interpreting the results and conclusions from the work as presented. I suggest that the authors 1) define "cell-scale signaling," as the precise meaning must be inferred, 2) reconsider some premises upon which some conclusions depend, 3) perform an essential assay validation, and 4) explain some other puzzling inconsistencies.

      Major points

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      As the reviewer points out, it is important that the meaning of the term ‘cell-scale signalling’ is clear to the reader and in response to their comment we have had another go at defining it explicitly in the Introduction to the manuscript.

      Specifically, we use the term ‘cell-scale signalling’ to describe possible intracellular mechanisms acting on core protein segregation to opposite cell membranes during core pathway dependent planar polarisation. For example, this could be a signal from distal complexes at one side of the cell leading to segregation of proximal complexes to the opposite cell edge, or vice versa. See also our response to Reviewer #2 regarding the distinction between ‘molecular-scale’ and ‘cell-scale’ signalling. 

      Changes to manuscript: Revised definition of ‘cell-scale signalling’ in Introduction.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained?

      We assume that the reviewer’s point is that it is not universally accepted that de novo induction after hinge contraction leads to uncoupling from global cues (rather than that it is not accepted that hinge contraction remodels radial polarity to a proximodistal pattern). We are (we believe) the only lab that has used de novo induction as a tool, and we’re not aware of any debate in the literature about whether this bypasses global cues. Nevertheless, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al., 2020; Yu et al., 2020).

      We think that the reviewer is proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation for this is that induction at these stages fails to establish the early radial pattern of core pathway polarity and hence hinge contraction cannot reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest (Etournay et al., 2015)).

      In this manuscript, our earliest de novo experiments begin with Fz induction at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B, referred to by the reviewer, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing.

      We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig. 5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region. We apologise for this oversight and the confusion caused and appreciate the feedback.

      The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      In all three ‘normal’ de novo conditions (6h, 8h and 10h), regardless of the time of induction, the polarity orientation patterns of Fz-mKate2 in pupal and adult wings are very similar in the experimentally analysed region (Fig. S5B-E). The strong local hair swirling agrees with the previous published data (Strutt and Strutt, 2002; Strutt and Strutt 2007). Overall, we don’t see any evidence that the 10h de novo induction results in more proximodistally coordinated polarity than the 8h or 6h conditions. This is consistent with our contention that there is no global cue present at these stages, which presumably would have a stronger effect when core pathway activity was induced at earlier stages.

      Changes to manuscript: Added additional explanation of the ‘de novo induction’ paradigm and why we believe the resulting polarity patterns are unlikely to be influenced by any global signals in Introduction and Results section ‘Induced core protein relocalisation…’. Added quantification of polarity in the experiment region proximal to the anterior cross-vein in pupal wings (Fig.S5E-E’’’) and zoomed-out images of the surrounding region in adult wings showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing (Fig.S5B-D), arguing against an unknown proximodistal polarity cue at these stages of development.

      In the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein. 

      The reviewer raises an important point, which we agree could confound our experimental interpretations. As suggested we have now carried out western blotting and quantitation for Fz::mKate2-sfGFP levels and added these data to Fig.S1 (Fig. S1C,D). Quantified Fz is not significantly different between the three de novo polarity induction timings and not significantly different compared to constitutive Fz::mKate2-sfGFP expression (although there is a trend towards increasing Fz::mKate2-sfGFP protein levels with increasing induction times). These data are consistent with Fz::mKate2-sfGFP being at steady state in our experiments and that levels are sufficient to achieve normal polarity (as constitutive Fz::mKate2-sfGFP does so). Therefore it is unlikely that differing protein levels explain the differing polarity magnitudes at the different induction times. Interestingly, Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, possibly due to lower expression or increased turnover/reduced recycling.

      Changes to manuscript: Added western blot analysis of Fz::mKate2-sfGFP expression under 10h, 8h and 6h induction conditions vs endogenous Fz expression and constitutive Fz::mKate2sfGFP expression (Fig.S1C-D) and discussed in Results section ‘Planar polarity establishment is…’.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cellscale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be. 

      Previously published results from our lab (Strutt et al., 2016 Cell Reports; Supplemental Fig. S6E) show that in a heterozygous fz mutant background, Fz protein levels are not affected by halving the gene dosage when compared to wt, suggesting that Fz is most likely produced in excess and is not normally limiting, but that protein that cannot form complexes may be rapidly degraded. We have now added this information to the text.

      Changes to manuscript: Added explanation in text that Fz levels had previously been shown to not be dosage sensitive in Results section ‘Planar polarity establishment is…’ and also added a caveat to the Discussion about not directly testing Fz.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out. 

      Concerning the choice of dsh allele, we appreciate the query of the reviewer regarding use of dsh[1] instead of a null, as there might be a concern that dsh[1] would give a less strong phenotype. The answer is that over more than two decades we and others have never found any evidence that dsh[1] does not act as a ‘null’ for planar polarity in the pupal wing, and furthermore use of dsh[1] preserves function in Wg signalling – and we would prefer to rule out any phenotypic effects due to any potential cross-talk between the two pathways that might be seen using a complete null. To expand on this point, dsh[1] mutant protein is never seen at cell junctions (Axelrod 2001; Shimada et al., 2001; our own work), and by every criteria we have used, planar polarity is completely disrupted in hemizygous or homozygous mutants e.g. see quantifications of polarity in (Warrington et al., 2017 Curr Biol).

      In terms of the broader point, whether we can rule out Dsh being limiting, we were very careful to be clear that we did not see evidence for Dsh (or other core proteins) being limiting in terms of ‘rates of core pathway de novo polarisation’. When the reviewer says ‘the statistically significant effect of Dsh dose reduction is puzzling’ we believe they are referring to the data in Fig. 3J, showing a small but significantly different reduction in stable Fz in de novo 6h conditions (also seen in 8h de novo conditions, Fig. S3I). As Dsh is known to stabilise Fz in complexes (Strutt et al., 2011 Dev Cell; Warrington et al., 2017 Curr Biol), in itself this result is not wholly surprising. Nevertheless, while this shows that halving Dsh levels does modestly reduce Fz stability, it does not alter our conclusion that halving Dsh levels does not affect Fz polarisation rate under either 6h or 8h de novo conditions.

      Unfortunately, we do not have available to us a practical way of achieving consistent intermediate reductions in Dsh levels (e.g. a series of verified transgenes expressing at different levels). Levels of all the core proteins could be dialled down using transgenes, to see when the system breaks, and indeed we have previously published that lower levels of polarity are seen if Fmi levels are <<50% or if animals are transheterozygous for pk, stbm, dgo or dsh, pk, stbm, dgo simultaneously (Strutt et al., 2016 Cell Reports). However, it seems to be a trivial result that eventually the ability to polarise is lost if insufficient core proteins are present at the junctions. For this reason we have focused on a simple set of experiments reducing gene dosage singly by 50% under two de novo induction conditions, and have been careful to state our results cautiously. The assays we carried out were a great deal of work even for just the 5 heterozygous conditions tested.

      We believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels (see above), the system might be expected to be sensitised to further dosage reductions, and despite this we failed to see an effect on rate of polarisation.

      We note that Reviewer #3 made a similar point about whether we can rule out dosage sensitivity on the basis of 50% reductions in protein level. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      In a similar vein, Reviewer #2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added more explanation of our choice of dsh[1] as an appropriate mutant allele to use in Results section ‘Planar polarity establishment is…’. Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components’.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. 

      As we’re sure the reviewer appreciates, this was an extremely complex experiment to perform and analyse. We spent a lot of time trying to find the best way to illustrate the results (finally settling on a 2D vector representation of polarity) and how to show the paired statistical comparisons between different groups. Moreover, in the end we were only able to detect generally quite modest (statistically significant) changes in cell polarity under the experimental conditions.

      However, we note that failure to see large and consistent changes in polarity is exactly the expected result if it is hard to repolarise from a boundary – and this is of course the conclusion that we draw. Conversely, if repolarisation were easy, which was our expectation at least under de novo conditions without existing polarity, then we would have expected large and highly statistically significant changes in polarity across multiple cell rows. Hence we stand by our conclusion that ‘it is hard to repolarise from a boundary of Fz overexpression in both control and de novo polarity conditions’.

      Overall, we were trying to establish three points:

      (1) to demonstrate that repolarisation occurs from a boundary of overexpression i.e. from boundary 0 to row 0

      (2) to establish whether a wave of repolarisation occurs across rows 1, 2 and 3

      (3) to determine if in repolarisation in de novo condition it is easier to repolarise than in repolarisation in the control (already polarised) condition Taking each in turn:

      (1) To detect repolarisation from a boundary relative to the control condition, we have to compare row 0 in repolarisation condition (Fig.5G,K) vs control condition (Fig.5F,J). This comparison shows a significative repolarisation (p=0.0014). From now, row 0 in repolarisation condition is our reference for repolarisation occurring.

      (2) To determine if there is a wave of repolarisation in the repolarisation condition we have to compare row 0 vs row 1 to 3 in the repolarisation condition (Fig.5K). Row 1 is not significantly different to row 0, but rows 2 and 3 are different and the vectors show obviously lower polarity than row 0. Hence no wave of repolarisation is detected over rows 1 to 3.

      (3) To determine if it is easier to repolarise in the de novo condition, our reference for establishment of a repolarisation pattern is the polarisation condition in rows 0 to 3. So, we compare repolarisation condition vs repolarisation in de novo condition, row 0 vs row 0, row 1 vs row 1, row 2 vs row 2 and row 3 vs row 3 – in each case no significative difference in polarity is detected, supporting our conclusion that it is not easier to repolarise in the de novo condition.

      We agree that the variations in row 3 are puzzling, but there is no evidence that this is due to propagation of polarity from row 0, and so in terms of our three questions, it does not alter our conclusions.

      Changes to manuscript: We have extensively revised the text describing the results in Fig.5 to hopefully make the reasons for our conclusions clearer and also be more cautious in our conclusions in Results section ‘Induced core protein relocalisation…’. 

      For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle. 

      We thank the reviewer for pointing out that this was not clear.

      All boundaries were classified following their orientation compared to the Fz over-expression boundary using hh-GAL4 expressed in the wing posterior compartment. Horizontal junctions were defined as parallel to the Fz over-expression boundary (between 0 and 45 degrees) and mediolateral junctions as junctions linking two horizontal boundaries (between 45 and 90 degrees).

      Changes to manuscript: The boundary classification detailed above has been added in the Materials and Methods.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

      Again the reviewer makes an interesting point, and we agree that this is something that we should have more directly addressed in the manuscript.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al., 2000 in the wing and Lawrence et al., 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      Minor points

      As the authors know, there is a model in the literature that suggests microtubule trafficking provides a global cue to orient PCP. The authors' repolarization data in Fig 4 make a reasonably convincing case against a role for no role for microtubules in cell-scale signaling, but do not rule out a role as a global cue. The authors should be careful of language such as "...MTs and core proteins being oriented independently of each other" that would appear to possibly also refer to a role as a global cue. 

      Thank you for pointing out that this was not clear. We have now modified the text to hopefully address this.

      Changes to manuscript: Text updated in Results section ‘Microtubules do not provide…’.

      Significance:

      There are two negative conclusions and one positive conclusion made by the authors. Provided the above points are addressed, the negative conclusions, that core proteins are not limiting and that microtubules are not involved in cell-scale signaling are solid. The positive conclusion is more nebulous - the authors say that cell-scale signaling is strong relative to cell-cell signaling - but how strong is strong? Strong relative to their prior expectations? I'm not sure how to interpret such a conclusion. Overall, we learn something from these results, though it fails to reveal anything about mechanism. These results will be of some interest to those studying PCP.

      The reviewer raises an interesting point, which is how do you compare the strength of two different processes, even if both processes affect the same outcome (in this case cell polarity). Repolarisation from a boundary has not been carefully studied at the level of core protein localisation in any previous study to our knowledge – this is one of the important novel aspects of this study. Hence there is not a baseline for defining strong repolarisation. Similarly, there has been no investigation of the nature of ‘cell-scale signalling’. This was a considerable challenge for us in writing the manuscript, and we have done our best to find appropriate language that hopefully conveys our message adequately. Minimally our work may provide a baseline for helping to define the ‘strengths’ of these processes in future studies.

      One of our main points is that we can generate an artificial boundary of Fz expression, where Fz levels are at least several fold higher than in the neighbouring cell (e.g. compare Fig.4N’ and O’) and only two rows of cells show a significant change in polarity relative to controls. Even when the tissue next to the overexpression domain is still in the process of generating polarity (de novo condition) then the boundary has little effect on polarity in neighbouring cell rows. This was a result that surprised us, and we tried to convey that by using language to suggest cell-scale signalling was stronger than cell-cell signalling i.e. stronger in terms of the ability to define the final direction of polarity.

      Changes to manuscript: In the revised manuscript we have reviewed our use of language and now avoid saying ‘strong’ but instead use terms such as ‘effective’ and ‘robust’ in e.g. Results section ‘Induced core protein relocalisation…’, the Discussion and we have also changed the title of the manuscript to avoid claiming a ‘strong’ signal.

      Reviewer #2:

      Overview

      This paper aims to dissect the relative importance of the various cues that establish PCP in the wing disc of Drosophila, which remains a prominent and relevant model for PCP. The authors suggest that one must consider cues at three scales (molecular, cell and tissue) and specifically design tests for the importance of cell-level cues, which they call non-local cell scale signalling. They develop clever experimental approaches that allow them to track complex stability and also to induce polarity at experimentally defined times. In a first set of experiments, they restore PCP after the global cues have disappeared (de novo polarisation) and conclude from the results that another (cell scale) cue must exist. In another set of experiments, they show that de novo repolarization is robust to the dosage of various components of core PCP, leading them to conclude that there must be an underlying cell scale polarity, which, apparently, has nothing to do with microtubule or cell shape polarity. They then describe nice evidence that de novo polarisation is relatively short range both in a polarised and unpolarised field. They conclude by there is a strong cell-intrinsic polarity that remains to be characterised.

      Critique

      The experiments described in this paper are of high quality with a sophisticated level of design and analysis. However, there needs to be some recalibration of the extent of the conclusions that can be drawn (see below). Moreover, a limitation of this paper is that, despite the quality of their data, they cannot give a molecular hint about the nature of their proposed cell-scale signal. Below are a two key points that the authors may want to clarify.

      (1) The first set of repolarisation experiment is performed after the global cell rearrangements that have been shown to act as global signal. However, this approach does not exclude the possible contribution of an unknown diffusible global signal.

      A similar point was raised by Reviewer 1. For the convenience of this reviewer, we’ll summarise the arguments against such an unknown cue again below. More broadly, both reviewers asking a similar question indicates that we have failed to lay out the evidence in sufficient detail. In our defence, we have used the same ‘de novo’ paradigm in three previous publications (Strutt and Strutt 2002, 2007; Brittle et al 2022) without attracting (overt) controversy. We have now added text to the Introduction and Results that goes into more detail, as well as more experimental evidence (Fig.S5).

      Firstly, it is worth noting that the global cues acting in the wing are poorly understood, with mostly negative evidence against particular cues accruing in recent years. This makes it a hard subject to succinctly discuss. Secondly, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al.,2020; Yu et al., 2020).

      We think that the reviewers are proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation of this is that induction at these stages fails to result in the early radial pattern of core pathway polarity being established and hence a failure of hinge contraction to reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest, Etournay et al., 2015).

      In this manuscript, our earliest de novo experiments begin at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B referred to by Reviewer 1, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing. We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig.S5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region.

      Changes to manuscript: Text extended in Introduction and Results to better explain why we believe the de novo conditions that we use most likely result in a polarity pattern that is not significantly influenced by ‘global cues’. Now show zoomed-out images of the surrounding region around the experiment region proximal to the anterior cross-vein region in adult wings, showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing, arguing against an unknown proximodistal polarity cue at these stages of development (Fig.S5B-E’’’).

      (2) The putative non-local cell scale signal must be more precisely defined (maybe also given a better name). It is not clear to me that one can separate cell-scale from molecular-scale signal.

      Local signals can redistribute within a cell (or membrane) so local signals are also cell-scale. Without a clear definition, it is difficult to interpret the results of the gene dosage experiments. The link between gene dosage and cell-scale signal is not rigorously stated. Related to this, the concluding statement of the introduction is too cryptic.

      We thank the reviewer for raising this, as again a similar comment was made by Reviewer 1, so we are clearly falling short in defining the term. We have now had another attempt in the Introduction.

      To more specifically answer the point made by the reviewer regarding molecular vs cellular, we are essentially being guided here by the prior computational modelling work, as at the biological level the details are still being worked out. A specific class of previous models only allowed ‘signals’ between core proteins to act ‘locally’, meaning within a cell junction, and within the models there was no explicit mechanism by which proteins on other junctions could ‘detect’ the polarity of a neighbouring junction (e.g. Amonlirdviman et al., 2005; Le Garrec et al., 2006; Fischer et al., 2013). Other models implicitly or explicitly encode a mechanism by which cell junctions can be influenced by the polarity of other junctions (e.g. Meinhardt, 2007; Burak and Shraiman, 2009; Abley et al., 2013; Shadkhoo and Mani, 2019), for instance by diffusion of a factor produced by localisation of particular planar polarity proteins.

      We agree with the reviewer that a cell-scale signal will depend on ‘molecules’ and thus could be called ‘molecular-scale’, but here by ‘molecular-scale’ we mean signals that at the range of the sizes of molecules i.e. nanometers, rather than cell-scale signals that act at the size of cells i.e. micrometers. A caveat to our definition is that we implicitly include interactions that occur locally on cell junctions (<1 µm range) within ‘molecular-scale’, but this is a shorter range than ‘cellular-scale’ which requires signals acting over the diameter of a cell (3-5 µm). Nevertheless, we think the concept of ‘molecular-scale’ vs ‘cell-scale’ is a helpful one in this context, and have attempted to address the issue through a more careful definition of the terms.

      Changes to manuscript: Text revised in Introduction and legend to Fig.1 to more carefully define ‘cell-scale signalling’ and to distinguish it from ‘molecular-scale signalling’. Final sentence of Introduction also altered so we no longer cryptically speculate on the nature of the cell-scale signal but leave this to the Discussion.

      Minor comments. 

      Some of the (clever) genetic manipulation may need more details in the text. For example:

      - Need to specify if the hs-flp approach induces expression throughout the tissue.

      We apologise for the lack of clarity. In all the experiments, the hs-FLP transgene is present in all cells, and heat-shock results in ubiquitous expression. 

      Changes to manuscript: We have clarified this in the Results and Materials and Methods.

      - Need to specify in the text that in the unpolarised condition the tissue is both dsh and fz mutant.

      The reviewer is of course correct and we have updated this point in the text. The full genotype for the unpolarised condition is: w dsh<sup>1</sup> hsFLP22/y;; Act>>fz-mKate2sfGFP, fz<sup>P21</sup>/fz<sup>P21</sup> (see Table S1). So this line is mutant for dsh and fz with induced expression of Fz-mKate2sfGFP. 

      Changes to manuscript: We have clarified this in the relevant part of the Results.

      - Need to specify in the text that the experiment illustrated in Fig 5 is with hh-gal4. 

      As noted by the reviewer, we continued to use the same hh-GAL4 repolarisation paradigm as in Fig.4 and this info was in the legend to Fig.5 legend. However, we agree it is helpful to be explicit about this in the main text.

      Changes to manuscript: We have added this to this section of the Results.

      - Need to address a possible shortcoming of the hh experiment, that the AP boundary is a region of high tension.

      It is true that the AP boundary is under high tension in the wing disc (e.g. Landsberg et al., 2009). But we are not aware of any evidence that this higher tension persists into the pupal wing. In separate studies we have labelled for Myosin II in pupal wings (Trinidad et al 2025 Curr Biol; Tan & Strutt 2025 Nature Comms), and as far as we have noticed have not seen preferentially higher levels on the AP boundary. We think if tension were higher, the cell boundaries would appear straighter than in surrounding cells (as seen in the wing disc) and this is not evident in our images.

      - Need to dispel the possibility that there is no residual polarisation (e.g. of other components) in fz1 mutant (I assume this is the case).

      We use the null allele fz[P21] through this work, and we and others have consistently reported a complete loss of polarisation of other core proteins or downstream components in this background. The caveat to this is that core proteins that persist at cell junctions always appear at least slightly punctate in mutant backgrounds for other core proteins, and so any automated detection algorithm will always find evidence of individual cell polarity above a baseline level of uniform distribution. Hence we tend to use lack of local coordination of polarity (variance of cell polarity angle) as an additional measure of loss of polarisation, in addition to direct measures of average cell polarity. (We discuss this in the QuantifyPolarity manuscript Tan et al 2021 e.g. Fig.S6).

      Changes to manuscript: We now include in the Materials and Methods section ‘Fly genetics…’ a much more extensive explanation of the evidence for specific mutant alleles being ‘null’ for planar polarity function (including dsh1 as raised by Reviewer 1), specifically that they result in no detectable planar polarisation of either other core proteins or downstream effectors, and added appropriate references.

      - Need to provide evidence that 50% gene dosage commensurately affect protein level. 

      This is a good suggestion. In the case of Stbm, we have already published a western blot showing that a reduction in gene dosage results in reduced protein levels (Strutt et al 2016, Fig.S6). We have now performed western blots to quantify protein levels upon reduction of fmi, pk and dgo levels (we actually used EGFP-dgo for the latter, as we don’t have antibodies that can detect endogenous Dgo on western blots).

      Changes to manuscript: When presenting the dosage reduction experiments, we now refer back to Strutt et al., 2016 explicitly for Stbm, and have added western blot data for Fmi, Pk and EGFPDgo in new Fig.S2.

      - I am surprised that the relationship with microtubule polarity was never investigated. Is this true? 

      We agree this is a point that needed further clarification, as Reviewer 1 made a related point regarding the two possible roles for microtubules, one being as a mediator of a global cue upstream of the core pathway, and the second (which we investigate in this manuscript) as a mediator of a cell-scale signal downstream of the core pathway.

      Both the Uemura and Axelrod groups have published on potential upstream function as a global cue mediator in the Drosophila wing (e.g. Shimada et al., 2006; Harumoto et al., 2010; Matis et al., 2014).

      Both groups have also looked out whether core pathway components could affect orientation of microtubules (Harumoto et al., 2010; Olofsson at al., 2014; Sharp and Axelrod 2016). Notably Harumoto et al., 2010 observed that in 24h APF wings, loss of Fz or Stbm did not alter microtubule polarity from a proximodistal orientation consistent with the microtubules aligning along the long cell axis in the absence of other cues. However, this did not rule out an instructive effect of Fz or Stbm on microtubule polarity during core pathway cell-scale signalling. The Axelrod lab manuscripts saw interesting effects of Pk protein isoforms on microtubule polarity, albeit not throughout the entire wing, which hinted at a potential role in cell-scale signalling. Taken together this prior work was the motivation for our directed experiments to specifically test whether the core pathway might generate cell-scale polarity by instructing microtubule polarity.

      Changes to manuscript: We have revised the Results section ‘Microtubules do not…’ to make a clearer distinction regarding possible ‘upstream’ and ‘downstream’ roles of microtubules in Drosophila core pathway planar polarity and the motivation for our experiments investigating the latter.

      - The authors suggest that polarity does not propagate as a wave. And yet the range measured in adult is longer than in the pupal wing. Explain. 

      Again an excellent point, also made by Reviewer 1, which we have now addressed explicitly in the manuscript. For the convenience of this reviewer, we lay out the reasons why we think the propagation of polarity seen in the adult is further than seen for core protein localisation.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).  

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al 2000 in the wing and Lawrence et al 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      - The discussion states that the cell-intrinsic system remains to be fully characterised, implying that it has been partially characterised. What do we know about it? 

      As the reviewer probably realises, we were attempting to side-step a long speculative discussion about the various hints and ideas in the literature by grouping them under the umbrella of ‘remaining to be fully characterised’. We would argue that this current manuscript is the first to attempt to systematically investigate the nature of ‘cell-scale signalling’. The lack of prior work is probably due to two factors (i) pioneering theoretical work showed that a sufficiently strong global signal coupled with ‘local’ (i.e. confined to one cell junction) protein interactions was sufficient to polarise cells without the need to invoke the existence of a cell-scale signal; (ii) there is no easy way to identify cell-scale signals as their loss results in loss of polarity which will also occur if other (i.e. more locally acting) core pathway functions are compromised.

      The main investigation of the potential for cell-scale signalling has been another set of theory studies (Burak and Shraiman 2009; Abley et al., 2013; Shadkhoo and Mani 2019) which have considered the possibility of diffusible signals. In our present work we have further considered the possibility of a ‘depletion’ model, based on the pioneering theory work of Hans Meinhardt, and as discussed above the possibility that microtubules could mediate a cell-scale signal.

      Changes to manuscript: We have revised the Discussion to hopefully be clearer about the current state of knowledge.

      Reviewer #3:

      The manuscript by Carayon and Strutt addresses the role of cell-scale signaling during the establishment of planar cell polarity (PCP) in the Drosophila pupal wing. The authors induce locally the expression of a tagged core PCP protein, Frizzled, and observe and analyze the de novo establishment of planar cell polarity. Using this system, the authors show that PCP can be established within several hours, that PCP is robust towards variation in core PCP protein levels, that PCP proteins do not orient microtubules, and that PCP is robust towards 'extrinsic' repolarization. The authors conclude that the polarization at the cell-scale is strongly intrinsic and only weakly affected by the polarity of neighboring cells. 

      Major comments

      The data are clearly presented and the manuscript is well written. The conclusions are well supported by the data. 

      (1) The authors use a system to de novo establish PCP, which has the advantage of excluding global cues orienting PCP and thus to focus on the cell-intrinsic mechanisms. At the same time, the system has the limitation that it is unclear to what extent de novo PCP establishment reflects 'normal' cell scale PCP establishment, in particular because the Gal4/UAS expression system that is used to induce Fz expression will likely result in much higher Fz levels compared with the endogenous levels. The authors should briefly discuss this limitation. 

      We apologise if this wasn’t clear. We only used GAL4/UAS overexpression when we were generating an artificial boundary of Fz expression with hh-GAL4 to induce repolarisation. The de novo induction system involves Fz::mKate2-sfGFP being expressed directly under an Act5C promoter without use of GAL4/UAS. In response to a comment from Reviewer 1 we have now carried out western blot analysis which shows that Fz::mKate2-sfGFP levels under Act5C are actually lower than endogenous Fz levels. As we achieve normal levels of polarity, similar to what we measure in wild-type conditions when measured using QuantifyPolarity, we assume that therefore Fz levels are not limiting under these conditions. However, we note that lower than normal levels of Fz might sensitise the system to perturbation, which in fact would be advantageous in our study, as it might for instance have been expected to more readily reveal dosage sensitivity of other components.

      Changes to manuscript: We now describe the levels of expression achieved using the de novo induction system (Fig.S1C-D) and discuss possible consequences in the relevant Results sections and Discussion.

      (2) Fig. 3. The authors use heterozygous mutant backgrounds to test the robustness of de novo PCP establishment towards (partial) depletion in core PCP proteins. The authors conclude that de novo polarization is 'extremely robust to variation in protein level'. Since the authors (presumably) lowered protein levels by 50%, this conclusion appears to be somewhat overstated. The authors should tune down their conclusion. 

      Reviewer 1 makes a similar point about whether we can argue that the lack of sensitivity to a 50% reduction in protein levels actually rules out the depletion model. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      We nevertheless believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, the system might be expected to be sensitised to further dosage reductions, and despite this we fail to see an effect on rate of polarisation.

      In a similar vein, Reviewer 2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      Minor comments 

      (1) Page 3. The authors mention and reference that they used the PCA method to quantify cell polarity magnification and magnitude. It would help the unfamiliar reader, if the authors would briefly describe the principle of this method. 

      Changes to manuscript: More details have been added in Materials & Methods.

      Significance:

      The manuscript contributes to our understanding of how planar cell polarity is established. It extends previous work by the authors (Strutt and Strutt, 2002,2007) that already showed that induction of core PCP pathway activity by itself is sufficient to induce de novo PCP. This manuscript further explores the underlying mechanisms. The authors test whether de novo PCP establishment depends on an 'inhibitory signal', as previously postulated (Meinhardt, 2007), but do not find evidence. They also test whether core PCP proteins help to orient microtubules (which could enhance cell intrinsic polarization of core PCP proteins), but, again, do not find evidence, corroborating previous work (Harumoto et al, 2010). The most significant finding of this manuscript, perhaps, is the observation that local de novo PCP establishment does not propagate far through the tissue. A limitation of the study is that the mechanisms establishing intrinsic cell scale polarity remain unknown. The work will likely be of interest to specialists in the field of PCP.

    1. eLife Assessment

      This important study by Wu et al presents convincing data on bacterial cell organization, demonstrating that the two structures that account for bacterial motility - the chemotaxis complex and the flagella - colocalize to the same pole in Pseudomonas aeruginosa cells, and expose the regulation underlying their spatial organization and functioning. This manuscript will be of interest to cell biologists, primarily those studying bacteria.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Comments on revisions:

      The authors have addressed all major and minor points that I raised in a satisfying way during the revision process. The work can now be regarded as complete: , the assumptions were clarified, the results are convincing, the conclusions are justified, and the novelty has been made clear. This manuscript will be of interest to cell biologists, mainly those studying bacteria, but not only

    3. Reviewer #2 (Public review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors and motor to the cell pole, but a separate mechanism colocalizes them. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data are high quality. It is clear that the motor and receptors co-localize, and that elevated CheY levels lead to elevated c-di-GMP. The signaling crosstalk argument is plausible.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while core motor structures are necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high-levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly-written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility. This work will be of interest to bacteriologists and cell biologists in general.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Comments on revisions:

      The authors have addressed all major and minor points that I raised in a satisfying way during the revision process. The work can now be regarded as complete, the assumptions were clarified, the results are convincing, the conclusions are justified, and the novelty has been made clear.

      This manuscript will be of interest to cell biologists, mainly those studying bacteria, but not only.

      Reviewer #2 (Public review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strength:

      The experiments and data are high quality. It is clear that the motor and receptors co-localize, and that elevated CheY levels lead to elevated c-di-GMP.

      Weakness:

      The explanation for the functional importance of receptor-motor colocalization is plausible but is still not conclusively demonstrated. Colocalization might reduce CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high-levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness for me in this paper is that the authors never discussed how the flagellar genes expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? how many classes are there for these genes? is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the valuable suggestions. In the revised manuscript, we have further elaborated on the regulatory control of flagellar genes expression in P. aeruginosa (see our response to comment #4).

      Comments on revisions:

      I believe the authors have performed additional experiments that improved their manuscript and they have answered many of my comments and those of the other reviewers. I am supportive of publishing this manuscript, but I still find the following points that are not clear to me (probably I am misunderstanding some points; the authors can clarify).

      (1) In response to reviewer 1, the authors say that they "analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization." I can see what they mean by polar and mid-cell, but near-polar sounds a bit elusive? Can they provide examples of this stage and mention how accurately they can identify it? Also, do the pie charts they show in Figure S4 really show "significant alterations"? There is a difference between 98% and 85% as they mention in their response to reviewer 1, but I am not sure that this is significant? Probably they can explain/change the language in the text? Also, the number of cells they counted for FlhF mutant is more than the double of other strains (WT and FlhF FliF mutant)?

      We thank the reviewer for the valuable suggestions. To clarify, we divided the intracellular area along the cell's long axis into three domains: the two ends each representing 10% of the length as the precise-polar domain, the central 50% as the mid-cell domain, and the remaining regions between these as the near-polar domain. The localization pattern of the chemotaxis complex was assigned based on the position of the fluorescence intensity centroid within these domains.

      Regarding the significance of the changes, you are correct to question our language. When flhF was knocked out, the proportion of chemotaxis complexes with precise-polar distribution decreased from 98% to 85% - a 13% reduction. While this represents a measurable shift in localization pattern, describing this as "significant alterations" was probably imprecise. We have revised this language to more accurately reflect the magnitude of the change (lines 169-177).

      For the cell counting, we increased the sample size for the flhF mutant because this strain exhibited the appearance of mid-cell localization (approximately 5% of cells), which was not observed in wild-type or flhF fliF double mutant strains. To accurately quantify this rare phenotype and ensure statistical reliability, we analyzed more cells for this particular strain. This explains why the flhF mutant dataset contains approximately double the number of cells compared to the other strains.

      We have redrawn Figure S4 to include a clear schematic diagram of the cell partitioning method and provided representative examples of each localization pattern (precise-polar, near-polar, and mid-cell) to better illustrate how we distinguished between these categories.

      (2) One thing that also confused me is the following: One point that the authors stress is that FlhF localizes both the flagellum and the chemoreceptors to the pole. However, if I look at Figure 2B, the flagellum and the chemoreceptors still co-localize together (although not at the pole). If FlhF was responsible for co-localizing both of them to the pole, then wouldn't one expect them to be randomly localized in this mutant and by that I mean that they do not co-localize but that each of them (the flagellum and the chemoreceptors) are located in a different random location of the cell (not co-localized). The fact that they are still co-localized together in this mutant could also be interpreted by, for example, that FlhF localizes the flagellum to the pole and another mechanism localizes the chemoreceptors to the flagellum, hence, they still co-localize in this mutant because the chemoreceptors follow the flagellum by another mechanism to wherever it goes?

      Thank you for this insightful observation. You are correct that our current experimental results do not definitively establish that FlhF directly localizes both the flagellum and chemoreceptors to the pole independently. The persistent colocalization of flagella and chemoreceptors in the DflhF mutant, even when both are mislocalized away from the pole, actually suggests a more complex regulatory mechanism than we initially proposed.

      This observation highlights an important distinction between polar targeting and colocalization maintenance. Our data suggest that FlhF influences the polar targeting of the flagellum-chemoreceptor assembly, but the colocalization itself appears to be governed by a different mechanism that operates independently of FlhF. This could involve direct protein-protein interactions between flagellar and chemotaxis components, or shared assembly machinery that we have yet to identify.

      To better reflect this interpretation, we have revised the subsection title (line 150). We have also modified the relevant discussion (line 180) to more accurately describe FlhF’s role in polar targeting rather than claiming it directly controls chemoreceptor localization.

      (3) In the response to reviewers, the authors mention "suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring". However, in the article, they still write "The complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex, and its assembly site is also regulated by the polar anchor protein FlhF" despite their FlgI results which is not in accordance with this statement? Also, As I mentioned in my previous report, in FliG and FliF mutant the motor does not assemble (see Hiroyuki Terashima et al. 2020., and Kaplan et al., 2022).

      We thank the reviewer for the suggestions and acknowledge the contradictions in our original text. You are correct that in DfliF and DfliG mutants, the flagellar motor does not assemble, while the P ring (FlgI) functions as a bushing for the peptidoglycan layer and its absence does not prevent motor assembly.

      Our DflgI results, which showed normal chemotaxis complex assembly similar to wild-type, clearly demonstrate that the P ring is not required for chemoreceptor complex formation. This contradicts our original statement that "complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex."

      We have corrected this inconsistency by: 1) Revising the subsection title (line 186) to more accurately reflect that core motor structures, rather than complete motor assembly, influences chemoreceptor complex formation. 2) Modifying sentences in the introduction (lines 97-98) to better align with our experimental findings.

      (4) The authors have said in their response to my point "and currently, there is no evidence that FliA activity is influenced by proteins like FliG". I just want to clarify what I meant in my previous report: In E. coli, FliA binds to FlgM, and when the hook is assembled FlgM is secreted outside the cell allowing FliA to trigger the transcription of class III genes, which include the chemosensory genes (see Figure 5 in Beeby et al, 2020 in FEMS Microbiology, and Chilcott and Hughes, 2000). This implies that if the hook is not built, then late genes (including the chemoreceptors) should not be present. However, in Kaplan et al., 2019, the authors imaged a FliF mutant in Shewanella oneidensis (Figure S3) and still saw that chemoreceptors are present (I believe the authors must highlight this). This suggests that species such as Shewanella and Pseudomonas have a different assembly process than that E. coli, and although the authors say that in the text, I believe they still can refine this part more in the spirit of what I wrote here.

      We thank the reviewer for the important clarification regarding the differences in transcriptional regulation among bacterial species. We agree that the observation of chemoreceptors in Shewanella oneidensis DfliF mutants (Kaplan et al., 2019) represents a significant deviation from the well-characterized E. coli model and merits stronger emphasis. In response, we have expanded the discussion to more clearly highlight the critical distinctions in the transcriptional regulatory circuits governing flagellar and chemoreceptor biogenesis between E. coli and species such as Shewanella oneidensis and Pseudomonas aeruginosa (lines 351-363).

      I do not like to ask for additional experiments in the second round of review, so for me if the authors modify the text to tackle these points and allow for probable alternative explanations/ highlight gaps/ modify language used for some claims, then that is fine with me.

      Reviewer #2 (Recommendations for the authors):

      It is plausible that colocalization reduces CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

    1. eLife Assessment

      This important computational study investigates homeostatic plasticity mechanisms that neurons may employ to achieve and maintain stable target activity patterns. The work extends previous analyses of calcium-dependent homeostatic mechanisms based on ion channel density by considering activity-dependent shifts in channel activation and inactivation properties that operate on faster and potentially variable timescales. The model simulations provide solid evidence for the potential functional importance of these mechanisms.

    2. Reviewer #1 (Public review):

      This revision of the computational study by Mondal et al addresses several issues that I raised in the previous round of reviews and, as such, is greatly improved. The manuscript is more readable, its findings are more clearly described, and both the introduction and the discussion section are tighter and more to the point. And thank you for addressing the three timescales of half activation/inactivation parameters. It makes the mechanism clearer.

      Some issues remain that I bring up below.

      Comment:

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, a few important questions are left unanswered.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      (2) Another related question is whether the speed of recovery is significantly modified by implemeting voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

    4. Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Major comments:

      (1) The main issue that I have with this study is the lack of exploration of "why" the model produces the results it does. Considering this is a model, it should be possible to find out why the three timescales of half-act/inact parameter modifications lead to different sets of results. Without this, it is simply an exploratory exercise. (The model does this, but we do not know the mechanism.) Perhaps this is enough as an interesting finding, but it remains unconvincing and (clearly) does not have the impact of describing a potential mechanism that could be potentially explored experimentally.

      This is now addressed in a new section in Results (“Potential Mechanism”):

      “To explore why the properties of the resulting bursters depend on the timescale of half-(in)activation adjustments, we examined what happens when SP1 is assembled under different half-(in)activation timescales: (1) fast, (2) intermediate (matching the timescale of ion channel density changes), and (3) infinitely slow (i.e., effectively turned off). The effects of these timescales can be seen by comparing the zoomed-in views of the SP1 activity profiles under each condition (Figure 4).

      When half-(in)activations are fast, the time evolution of — which tracks how far the activity pattern is from its targets (see Methods)—shows an abrupt jump as it searches for a voltage-dependence configuration that meets calcium targets (Figure 4A). As this happens, the channel densities are slightly altered, and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 4B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 4C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, different timescales of half-(in)activation alteration led the model through a different path in the space of activity profiles and intrinsic properties. Ultimately, this resulted in distinct final activity patterns – all of which were consistent with the Ca<sup>2+</sup> targets [22].

      (2) A related issue is the use of bootstrapping to do statistics for a family of models, especially when the question is in fact the width of the distribution of output attributes. I don't buy this. One can run enough models to find say N number of models within a tight range (say 2% cycle period) and the same N number within a loose range (say 20%) and compare the statistics within the two groups with the same N.

      We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we removed the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      (3) The third issue is that many of the results that are presented (but not the main one) are completely expected. If one starts with gmax values that would never work (say all of them 0), then it doesn't matter how much one moves the act/inact curves one probably won't get the desired activity. Alternately, if one starts with gmax values that are known to work and randomizes the act/inact midpoints, then the expectation would be that it converges to something that works. This is Figure 1 B and C, no surprise. But it should work the other way around too. If one starts with random act/inact curves that would never work and fixes those, then why would one expect any set of gmax values would produce the desired response? I can easily imagine setting the half-act/inact values to values that never produce any activity with any gmax.

      We appreciate this observation and agree that it highlights a limitation of our initial condition sampling. Our claim that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism is not intended as a general statement. Rather, we make this observation only within the specific range of initial conditions we explored. Within this restricted set, we found that the conductance mechanism was sufficient for successful assembly, while the half-(in)activation mechanism alone was not. We have revised the manuscript to limit the claim.

      “The results shown in Figure 1A require activity-dependent regulation of the maximal conductances. When activity-dependent regulation of the maximal conductances is turned off, the model failed to assemble SP1 into a burster (Figure 1B). This was seen in the other 19 Starting Parameters (SP2-SP20), as well [22].

      (4) A potential response to my previous criticism would be that you put reasonable constraints on gmax's or half-act/inact values or tie the half-act to half-inact. But that is simply arbitrary ad hoc decisions made to make the model work, much like the L8-norm used to amplify some errors. There is absolutely no reason to believe this is tied to the biology of the system.

      Here the reviewer highlights that model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not necessarily justified by biology. A discussion of the constraints on maximal conductance and half-(in)activation are in the Model Assumptions section at the end of Methods. The Methods also contains a longer discussion of the use of the L8 norm:

      “To compute this match score, we adapted a formulation from Alonso et al (2023),  who originally used a root-mean-square (RMS) or  norm to combine the sensor mismatches. In that approach, each error (, , and ) is divided by its allowable tolerance (, , and ) to produce a normalized error. These normalized errors are then squared, summed, and square-rooted to produce a single scalar score that reflects how well the model matches the target activity pattern.

      In our version, we instead used an  norm, which raises each normalized error to the 8th power before summing and taking the 1/8th root. This formulation emphasizes large deviations in any one sensor, making it easier to pinpoint which feature of the activity is limiting convergence. By amplifying outlier mismatches, this approach provided a clearer view of which sensor was driving model mismatch, helping us both interpret failure modes and tune the model’s sensitivity by adjusting the tolerances for individual sensor errors.

      Although the  norm emphasizes large deviations more strongly than the  norm, the choice of norm does not fundamentally alter which models can converge—a model that performs well under one norm can also be made to perform well under another by adjusting the allowable tolerances. The biophysical mechanisms by which neurons detect deviations from target activity and convert them into changes in ion channel properties are still not well understood. Given this uncertainty, and the fact that using different norms ultimately shouldn’t affect the convergence of a given model, the use of different norms to combine sensor errors is consistent with the broader basic premise of the model: that intrinsic homeostatic regulation is calcium mediated [22].

      (5) The discussion of this manuscript is at once too long and not adequate. It goes into excruciating detail about things that are simply not explored in this study, such as phosphorylation mechanisms, justification of model assumptions of how these alterations occur, or even the biological relevance. (The whole model is an oversimplification - lack of anatomical structure, three calcium sensors, arbitrary assumptions, and how parameter bounds are implemented.) Lengthy justifications for why channel density & half-act/inact of all currents are obeying the same time constant are answering a question that no one asked. It is a simplified model to make an important point. The authors should make these parts concise and to the point. More importantly, the authors should discuss the mechanism through which these differences may arise. Even if it is not clear, they should speculate.

      We agree. A long discussion on Model Assumptions and potential biological mechanisms that implement alteration in channel voltage-dependence obscure this. The former is relocated to the Methods section. The latter discussion is shortened. A discussion of a potential mechanism is included in the Results (Figure 4).

      (6) There should be some justification or discussion of the arbitrary assumptions made in the model/methods. I understand some of this is to resolve issues that had come up in previous iterations of this approach and in fact the Alonso et al, 2023 paper was mainly to deal with these issues. However, some level of explanation is needed, especially when assumptions are made simply because of the intuition of the modeler rather than the existence of a biological constraint or any other objective measure.

      A discussion of Model Assumptions is included in the Methods.

      Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach the target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, the main conclusions of the paper and the insights brought by this computational work are difficult to grasp.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The writing is rather confusing, and the state of the art explaining the need for the study is unclear.

      We reorganized the manuscript to make its focus clearer.

      Introduction: We clarified our explanation of the state-of-the-art. Briefly, prior work on activity-dependent homeostasis has focused on regulating ion channel density. Neurons have also been documented to homeostatically regulate channel voltage-dependence. However, the consequences of channel voltage-dependence alterations on homeostatic regulation remain underexplored. To study this, we extend a computational model of activity-dependent homeostasis — originally developed to only alter channel density— to alter channel voltage-dependence.

      Results: We reorganized this section to underscore the main point: that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism. Figures 1A and 1B were retained to provide context—Figure 1A illustrates how activity can emerge from random initial conditions, while Figure 1B suggests that in these simulations, modulation of half-(in)activation played a specific limited role. Figure 2 builds on Figure 1A by summarizing how intrinsic properties and activity characteristics vary across a population of 20 bursters. Figure 3 then demonstrates that despite playing this specific limited role, altering the timescale of half-(in)activation in these simulations significantly impacted the intrinsic properties and activity characteristics of the bursters targeted by the homeostatic mechanism. Figure 4 supports this by offering a possible mechanistic explanation. Finally, Figure 5 reinforces the central message by showing how the same population responds to perturbation when the timescale of half-(in)activation alterations is varied—essentially extending the analysis of Figure 3 to a perturbed regime.

      Discussion: The Discussion concentrates on more specifically on how the timescale of half-(in)activation alterations shape bursters targeted he homeostatic mechanism. Extended content on model assumptions is moved to Methods. The discussion of biological pathways that implement channel voltage-dependence is shortened to avoid distracting from the main message.

      Methods: Aside from moving model assumptions here, we removed discussion of the “Group of 5” and explained in more detail why we chose the L8 norm.

      (2) The main outcomes and conclusions of the study are difficult to grasp. What is predicted or explained by this new version of homeostatic regulation of neuronal activity?

      Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern the Discussion highlights two potential implications of our findings—one to neuronal development and another to pathologies that may arise from disruptions to homeostatic processes:

      “One application for the simulations involving the self-assembly of activity may be to model the initial phases of neural development, when a neuron transitions from having little or no electrical activity to possessing it (Baccaglini & Spitzer 1977). As shown in Figure 6, the timescale of (in)activation curve alterations define a neuron's activity characteristics and intrinsic properties. As such, neurons may actively adjust these timescales to achieve a specific electrical activity aligned with a developmental phase’s activity targets. Indeed, developmental phases are marked by changes in ion channel density and voltage-dependence, leading to distinct electrical activity at each stage (Baccaglini & Spitzer 1977, Gao & Ziskind-Conhaim 1998, Goldberg et al 2011, Hunsberger & Mynlieff 2020, McCormick & Prince 1987, Moody & Bosma 2005, O'Leary et al 2014, Picken Bahrey & Moody 2003).

      Additionally, our results show that activity-dependent regulation of channel voltage-dependence can play a critical role in restoring neuronal activity during perturbations (Figure 5). Specifically, the presence and timing of half-(in)activation modulation influenced whether the model neuron could successfully return to its target activity pattern. Many model neurons only achieved recovery when a half-(in)activation mechanism was present. Moreover, the speed of this modulation shaped recovery outcomes in nuanced ways: some model neurons reached their targets only when voltage-dependence was adjusted rapidly, while others did so only when these changes occurred slowly. These observations all suggest that impairments in a neuron’s ability to modulate the voltage-dependence of its channels may lead to disruptions in activity-dependent homeostasis. This may have implications for conditions such as addiction (Kourrich et al 2015) and Alzheimer’s disease (Styr & Slutsky 2018), where disruptions in homeostatic processes are thought to contribute to pathogenesis.”

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement activity-dependent changes in ion channel conductance to support homeostatic plasticity. While changes in the voltage-dependent properties of ion channels are known to modulate neuronal excitability, their role as a homeostatic plasticity mechanism interacting with channel conductance has been largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage-dependent properties can interact with plasticity in channel conductance to allow neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. These results also show that the rate of channel voltage-dependent shifts can influence steady-state parameters reached as the model stabilizes into a stable intrinsic bursting state. That is, the rate of these modifications shapes the range of channel conductances and half-(in)activation parameters as well as activity characteristics such as burst period and duration. A major conclusion of the study is that altering the timescale of channel voltage dependence can seamlessly shift a neuron's activity characteristics, a mechanism that the authors argue may be employed by neurons to adapt to perturbations. While the study's conclusions are mostly well-supported, additional analyses, and simulations are needed.

      (1) A main conclusion of this study is that the speed at which (in)activation dynamics change determines the range of possible electrical patterns. The authors propose that neurons may dynamically regulate the timescale of these changes (a) to achieve alterations in electrical activity patterns, for example, to preserve the relative phase of neuronal firing in a rhythmic network, and (b) to adapt to perturbations. The results presented in Figure 4 clearly demonstrate that the timescale of (in)activation modifications impacts the range of activity patterns generated by the model as it transitions from an initial state of no activity to a final steady-state intrinsic burster. This may have important implications for neuronal development, as discussed by the authors.

      However, the authors also argue that the model neuron's dynamics - such as period, and burst duration, etc - could be dynamically modified by altering the timescale of (in)activation changes (Figure 6 and related text). The simulations presented here, however, do not test whether modifications in this timescale can shift the model's activity features once it reaches steady state. In fact, it is unlikely that this would be the case since, at steady-state, calcium targets are already satisfied. It is likely, however, as the authors suggest, that the rate at which (in)activation dynamics change may be important for neuronal adaptation to perturbations, such as changes in temperature or extracellular potassium. Yet, the results presented here do not examine how modifying this timescale influences the model's response to perturbations. Adding simulations to characterize how alterations in the rate of (in)activation dynamics affect the model's response to perturbations-such as transiently elevated extracellular potassium (Figure 5) - would strengthen this conclusion.

      The reviewer suggests that our core message — namely, that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism — should also hold during perturbations. We agree that this extension strengthens the central message and have incorporated it into the subsection of the Results (“Half-(in)activation Alterations Contribute to Activity Homeostasis”) and Figure 5.

      (2) Another key argument in this study is that small, coordinated changes in channel (in)activation contribute to shaping neuronal activity patterns, but that, these subtle effects may be obscured when averaging across a population of neurons. This may be the case; however, the results presented don't clearly demonstrate this point. This point would be strengthened by identifying correlations, if they exist, between (in)activation curves, conductance, and the resulting bursting patterns of the models for the simulations presented in Figure 2 and Figure 4, for example. Alternatively, or additionally, relationships between (in)activation curves could be probed by perturbing individual (in)activation curves and quantifying how the other model parameters compensate, which could clearly illustrate this point.

      In part of the Discussion, we noted that small, coordinated shifts in half-(in)activation curves could be obscured when averaging across a population of neurons. Our intention was not to present this as a primary result, but to highlight an emergent consequence of the model: that distinct initial maximal conductances may converge to activity targets via different small shifts in half-(in)activation, making such changes difficult to detect at the population level. However, we did not systematically examine correlations between (in)activation parameters, conductances, and activity features, nor how these correlations might vary with the timescale of (in)activation modulation. While this observation is consistent with model behavior, it does not directly advance the study’s main point — that the timescale of half-(in)activation modulation influences the types of bursting patterns that satisfy the activity target. To keep the focus clear, we have removed this remark from the Discussion, though we agree that a more detailed analysis of these correlations may offer a fruitful direction for future work.

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Page 5: remove "an" from "achieve a given an activity..."

      The sentence containing this error has been removed.

      (2) Page 7, bottom of page. Explain what prespecifying means here. This requires a conceptual explanation, even if the equations are given in the methods. Was one working ad hoc model built from which the three sensor values were chosen? What was this model and how was it benchmarked? The sensors are never shown. In any figure, but presumably they have different kinetics. What is meant by "average value"? What was the window of averaging and why?

      The intention of this passage was to provide a broad overview of the homeostatic mechanism, with the rationale for using sensor “averages” as homeostatic targets explained in detail in the Methods. We have replaced the word “average” with “target” to maintain this focus.

      (3) Page 9: add "the" in "electrical activity of the neuron as [the] model seeks...".

      Done

      (4) Page 9: say briefly what alpha is before using it. Also, please be consistent in either using the symbol for alpha or spelling it out across the manuscript and the figures.

      Done

      (5) Page 10: the paragraph "In general, ..." is confusing although it becomes clear later on what this is all about. Please rewrite and expand this to clarify some points. For instance, the word "degenerate" is first used here and it is unclear in what sense these models are degenerate. Then it is unclear why the first 5 models were chosen and then 15 more added. What was the point of doing this? What is the intent? Set this up properly before saying that you just did it. This also would clarify the weird terminology used later on of Group of 20 vs. Group of 5. The 20 and 5 are arbitrary. Say what the purpose is. Finally, is the "mean" at the very end the same 416 ms? If not, what do you mean by "the mean"? In fact, I find these 2% and 20% to be imprecise substitutes of (say) two distinct values of CV which are an order of magnitude different. Is that the intent?

      This comment refers to a passage that was removed during revision.

      (6) Page 10: this may be clear to you, but it took me a while to understand that in Figure 1C, you took the working model at the end of 1A, fixed the gmax values and randomized just the half-act/inact values to run it. Perhaps rewrite this to clarify?

      This comment refers to a figure that was removed during revision.

      (7) Page 13: why do channel densities not change much after the perturbation?

      This comment refers to a figure that has since been reworked during revision. In particular, we only study what happens during perturbation. This question is interesting and is the subject of ongoing work.

      Reviewer #2 (Recommendations for the authors):

      The article should be carefully corrected, because the current quality of writing might obscure the interest of the study. Particular attention should be paid to the state-of-the-art section and to the discussion, but even the writing of the results should be carefully reworked. The current state of the article makes it very difficult to understand the motivation behind the study but also what the main result provided by this work is.

      The Introduction, Results, and Discussion have been reworked to build on the central premise of the work: the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by the neuron’s homeostatic mechanism. These changes are detailed in Public Comment #1.

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents an interesting computational study exploring how activity-dependent regulation of (in)activation dynamics interacts with conductance plasticity to shape neuronal activity patterns. While the study provides valuable insights, some aspects would benefit from clarification, further analyses, and/or additional simulations to strengthen the conclusions. Below, I outline concerns and comments related to specific details of the model and results presentation that were not included in the public review.

      (1) The results presented in Figure 5 show that adaptation occurs in both channel conductances and (in)activation dynamics; however, the changes in conductance remain relatively permanent after the model recovers from the transient elevation in extracellular potassium. It therefore seems likely that the model would recover bursting more quickly in response to a subsequent exposure to simulated elevated extracellular potassium since large modifications in the slowly changing conductances would not be required. If this is the case, it could provide a plausible mechanism for adaptation to repeated high-potassium exposure, as demonstrated experimentally in Cancer borealis by this group (PMID: 36060056).

      This is an astute observation and the subject of our present follow-up investigation.

      (2) In the text relating to Figure 5, it is argued that the resulting shifts in (in)activation curves may be conceptualized as alterations in window currents. It would be helpful to illustrate this by plotting and comparing changes in window currents of these channels alongside the changes in their (in)activation curves.

      This comment refers to a passage that was removed during revision.

      (3) Some discussion of the role these homeostatic mechanisms may play when the neuron is synaptically integrated into a rhythmically active network could be informative. Surely, phasic and tonic inputs to the neuron would alter its conductance and voltage-dependent properties. Therefore, the model's parameters in an intact network could be very different from those in the synaptically isolated case.

      This is an excellent point. We agree that synaptic context—particularly tonic and phasic inputs—would likely influence a neuron’s conductances and voltage-dependent properties, potentially leading to different homeostatic outcomes than in the isolated case. While our current study focuses on synaptically isolated neurons, the Marder lab has considered how homeostatically stabilized neurons might interact in network settings. For example, O'Leary et al (2014) presents an example network of three such neurons operating under homeostatic regulation. However, systematically exploring this question remains a challenge. We are currently developing ideas to study this in the context of a simplified half-center oscillator model, where network-level dynamics can be more tractably analyzed.

      (4) Why are the transitions of alpha typically so abrupt, essentially either 1 or 0? Similarly, what happens in the model when there are transient transitions from what appears to be a steady-state alpha that abruptly shifts from 0 to 1 or 1 to 0? For example, what is occurring in Figure 1A at ~150s and ~180s when alpha jumps between 1 and 0, or in Figure 1B when the model transiently jumps up from 0 to 1 at ~400s and ~830s? In Figure 1A, does the bursting pattern change at all after ~250s, or is it identical to the pattern at c?

      This is addressed in the revision (Lines 141 – 150).

      (5) Are the final steady-state parameters of the 25 (sic) models consistent with experimental observations?

      It is difficult to assess — it is hard to design an experiment to do what the reviewer is suggesting.

      (6) Why isn't gL allowed to change dynamically? This seems like the most straightforward way to allow a neuron to adjust its excitability (aside from tonic synaptic inputs).

      Passive currents could, in principle, be subject to homeostatic regulation. However, our study focused on active intrinsic currents. This focus stems from earlier investigations, which showed that active currents are dynamically regulated during homeostasis – for instance Turrigiano et al (1995) and (Desai et al 1999).

      Alonso LM, Rue MCP, Marder E. 2023. Gating of homeostatic regulation of intrinsic excitability produces cryptic long-term storage of prior perturbations. Proc Natl Acad Sci U S A 120: e2222016120

      Baccaglini PI, Spitzer NC. 1977. Developmental changes in the inward current of the action potential of Rohon-Beard neurones. J Physiol 271: 93-117

      Desai NS, Rutherford LC, Turrigiano GG. 1999. Plasticity in the intrinsic excitability of cortical pyramidal neurons. Nature Neuroscience 2: 515-20

      Gao BX, Ziskind-Conhaim L. 1998. Development of ionic currents underlying changes in action potential waveforms in rat spinal motoneurons. J Neurophysiol 80: 3047-61

      Goldberg EM, Jeong HY, Kruglikov I, Tremblay R, Lazarenko RM, Rudy B. 2011. Rapid developmental maturation of neocortical FS cell intrinsic excitability. Cereb Cortex 21: 666-82

      Hunsberger MS, Mynlieff M. 2020. BK potassium currents contribute differently to action potential waveform and firing rate as rat hippocampal neurons mature in the first postnatal week. J Neurophysiol 124: 703-14

      Kourrich S, Calu DJ, Bonci A. 2015. Intrinsic plasticity: an emerging player in addiction. Nature Reviews Neuroscience 16: 173-84

      McCormick DA, Prince DA. 1987. Post-natal development of electrophysiological properties of rat cerebral cortical pyramidal neurones. J Physiol 393: 743-62

      Moody WJ, Bosma MM. 2005. Ion channel development, spontaneous activity, and activity-dependent development in nerve and muscle cells. Physiol Rev 85: 883-941

      O'Leary T, Williams AH, Franci A, Marder E. 2014. Cell types, network homeostasis, and pathological compensation from a biologically plausible ion channel expression model. Neuron 82: 809-21

      Picken Bahrey HL, Moody WJ. 2003. Early development of voltage-gated ion currents and firing properties in neurons of the mouse cerebral cortex. J Neurophysiol 89: 1761-73

      Styr B, Slutsky I. 2018. Imbalance between firing homeostasis and synaptic plasticity drives early-phase Alzheimer’s disease. Nature Neuroscience 21: 463-73

      Turrigiano G, LeMasson G, Marder E. 1995. Selective regulation of current densities underlies spontaneous changes in the activity of cultured neurons. J Neurosci 15: 3640-52

    1. eLife Assessment

      This valuable study demonstrates that D1- and D2-striatal neurons receive distinct cortical inputs, offering key insights into corticostriatal function. For instance, in the context of striatal-dependent learning, this distinction is highly informative for interpreting synaptic physiology data, particularly when inputs to one neuron subtype may change independently of the other. The strength of the evidence is solid, with anatomical and electrophysiological findings aligning well with results from optogenetic and behavioral studies.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function.

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments.

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic.

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences?

    3. Reviewer #2 (Public review):

      Summary:

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs).

      Strengths:

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure.

      Weaknesses:

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

    4. Reviewer #3 (Public review):

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not.

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points.

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below.

      Major:<br /> There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results.

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included.

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2.

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity.

      This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down.

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations.

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results.

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed.

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret.

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility.

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments.

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of the molecular basis by which early symmetry breaking events connect to the following cell fate specifications in preimplantation mammalian embryos. The evidence supporting the conclusions is compelling, with advanced image based assays and microinjection based functional tests. The work will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This work starts with the observation that embryo polarization is asynchronous starting at the early 8-cell stage, with early polarizing cells being biased towards producing the trophectoderm (TE) lineage. They further found that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and TE specification, this piece of evidence connects the previous finding that at Carm1 heterogeneity 4-cell stage guide later cell lineages - the higher Carm1-expressing blastomeres are biased towards ICM lineage. Thus, this work provides a link between asymmetries at the 4-cell stage and polarization at the 8-cell stage, providing a cohesive explanation regarding the first lineage allocation in mouse embryos.

      Strengths:

      In addition to what has been put in the summary, the advanced 3D image-based analysis has found that early polarization is associated with a change in cell geometry in blastomeres, regarding the ratio of the long axis to the short axis. This is considered a new observation that has not been identified.

      Weaknesses:

      For the microinjection-based method to overexpression/deletion of proteins, although it has been shown to be effective in the early embryo settings and has been widely used, it may not fully represent the in vivo situation in some cases, compared to other strategies such as the use of knock-in mice.

      This is a minor weakness and has been discussed by the author in the revised manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work starts with the observation that embryo polarization is asynchronous starting at the early 8-cell stage, with early polarizing cells being biased towards producing the trophectoderm (TE) lineage. They further found that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and TE specification, this piece of evidence connects the previous finding that at Carm1 heterogeneity 4-cell stage guide later cell lineages - the higher Carm1-expressing blastomeres are biased towards ICM lineage. Thus, this work provides a link between asymmetries at the 4-cell stage and polarization at the 8-cell stage, providing a cohesive explanation regarding the first lineage allocation in mouse embryos.

      Strengths:

      In addition to what has been put in the summary, the advanced 3D image-based analysis has found that early polarization is associated with a change in cell geometry in blastomeres, regarding the ratio of the long axis to the short axis. This is considered a new observation that has not been identified.

      Weaknesses:

      For the microinjection-based method to overexpression/deletion of proteins, although it has been shown to be effective in the early embryo settings and has been widely used, it may not fully represent the in vivo situation in some cases, compared to other strategies such as the use of knock-in mice. This is a minor weakness; it would be good to include some sentences in the discussion on the potential caveats.

      We thank the reviewer for their insightful summary of our work, and their adjudication on the novelty of our research. We agree with the reviewer that microinjection-based methods, whilst being the standard and widely used in the field, have their weaknesses. In this study, we have primarily used microinjection of previously tested and known constructs which may help mitigate these concerns, and have referenced numerous studies in which these constructs have been used and tested. Nevertheless, the authors are aware of this drawback and have tried to address this previously in other research using novel artificial intelligence techniques (Shen and Lamba et al., 2022 – cited in the manuscript) and this continues to be an active area of investigation for us.

      Reviewer #2 (Public review):

      Summary:

      In this study, Lamba and colleagues suggest a molecular mechanism to explain cell heterogeneity in cell specification during pre-implantation development. They show that embryo polarization is asynchronous. They propose that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and trophectoderm specification.

      Strengths:

      The authors use appropriate and validated methodology to address their scientific questions. They also report excellent live imaging. Most of the data are accompanied by careful quantifications.

      Weaknesses:

      I think this manuscript requires some more quantification, increased number of embryos in their evaluations and clearly stating the number of embryos evaluated per experiments.

      We thank the reviewer for these thoughtful comments on our work, their kind assessment of the strength of our research, and their notes on the weaknesses. We have replied to their points raised below.

      Here are some points:

      (1) It should be clearly stated in all figure legends and in the text how many cells from how many embryos were analyzed.

      We appreciate this comment to provide detailed quantification for every experiment in the paper and stating the numbers of embryos (if a whole embryo level experiment) or blastomeres used for statistical tests and displayed in the graph.

      (2) I think that the number of embryos sometimes are too low. These are mouse embryos easily accessible and the methods used are well established in this lab, so the authors should make an effort to have at least 10/15 embryos per experiment. For example "In agreement with this, hybridization chain reaction (HCR) RNA fluorescence in situ hybridization of early 8-cell stage embryos revealed that the number of CDX2 mRNA puncta was higher in polarized blastomeres with a PARD6-positive apical domain than in unpolarized blastomeres, for 5 out of 6 embryos with EP cells (Figure 3A, B)".. or the data for Figure 4, we know how many cells but now how many embryos.

      We appreciate the reviewer’s comment regarding the number of embryos used in the hybridization chain reaction (HCR) experiment. We agree that increasing the number of embryos could, in principle, further add statistical power. However, both first authors have since left the lab to begin their postdoctoral training or joining a company, and it is not feasible for us to generate additional embryos at this stage.

      Importantly, we believe the number of embryos included in the current manuscript is sufficient to support our conclusions, especially when considered in the context of the broader experimental design, the timing of the study, and our ethical commitment to minimizing animal use.

      Notably, the initial HCR experiment targeting Cdx2 mRNA served as a key indication that prompted further investigation of CDX2 at the protein level. These follow-up experiments were conducted with increased numbers of embryos and/or cells and are presented in Figure 3 and the associated supplementary figures (we now have 124 cells (including 23 EP cells) from 16 embryos), thereby strengthening and confirming the conclusion suggested by the HCR data.

      (3) It would be useful to see in Figure 4 an example of asymmetric cell division as done for symmetric cell division in panel 4B. This could really help the reader to understand how the authors assessed this.

      We used live imaging to track cell division patterns. Cells expressing RFP-tagged polarity proteins were observed during division to identify the resulting daughter cells. Immediately after cytokinesis, we assessed the polarity status of each daughter cell. If both daughter cells were polarized, the division was classified as symmetric; if only one was polarized, it was classified as asymmetric.

      Author response image 1.

      8-cell stage embryos expressing Ezrin-RFP (fire colour) was imaged during 8-16 cell stage division. Top panel arrows indicate a symmetric cell division in which polarity domain became partitioned into both daughter cells; bottom panel indicates asymmetric division in which the polarity domain only get inherited to one cell of the two daughter cells.

      (4) Figure 5C there is a big disproportion of the number of EP and LP identified. Could the authors increase the number of embryos quantified and see if they can increase EP numbers?

      We thank the reviewer for this comment and want to clarify an important detail: EP cells are a phenomenon with average cellular frequency of less than 10% as compared to LP cells (the other 90%). Therefore, when investigating natural embryo development without bias or exclusion, there will likely be an imbalance in the number of EP and LP cells as is the case for Figure 5C. In this case, morphological differences and clear statistical significance were seen between the shape of EP and LP cells within the cells quantified and therefore we decided not to expend further mice for this particular experiment – but we agree with the comment that in most cases additional embryos would help strength our conclusions further.

      (5) Could the authors give more details about how they mount the embryos for live imaging? With agarose or another technique? In which dishes? Overlaid with how much medium and oil? This could help other labs that want to replicate the live imaging in their labs. Also, was it a z-stack analysis? If yes, how many um per stack? Ideally, if they also know the laser power used (at least a range) it would be extremely useful.

      We thank the reviewer for this comment and have provided additional detail here and in the Methods section. For live imaging our embryos, we used glass-bottom 35 mm dishes. We then fixed a small cut square of nylon mesh (5mm to 1cm width and height) onto this plate in the centre using silicon which was used as a grid (diameter of approximately 150 micrometres) for deposition of embryos. After drying of the silicon (overnight) and washing with water, the grid was overlaid with a drop of 100 microlitres of KSOM and then covered with mineral oil until this KSOM drop was submerged. After incubation under conditions for live imaging, single embryos were deposited in each ‘well’ of the grid before being placed in the microscope, which was equilibrated at the correct temperature and CO2.

    1. eLife Assessment

      This manuscript applies state-of-the-art techniques to define the cellular composition of the dorsal vagal complex in two rodent species (mice and rats). The result is a fundamental resource that substantially advances our understanding of the dorsal vagal complex's role in the regulation of feeding and metabolism while also highlighting key differences between species. The analyses of single-cell profiling experiments in the manuscript provide compelling insight into the cellular architecture of the dorsal vagal complex, with potential implications for obesity therapeutics.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is using state-of-the-art techniques to define the cellular composition and its complexity in two rodent species (mice and rats). The study is built on available datasets but extends those in a way that future research will be facilitated. The study will be of high impact for the study of metabolic control.

      Strengths:

      After revision, the paper is much improved. I have no further comments.

    3. Reviewer #2 (Public review):

      In this manuscript, Hes et al. present a comprehensive multi-species atlas of the dorsal vagal complex (DVC) using single-nucleus RNA sequencing, identifying over 180,000 cells and 123 cell types across five levels of granularity in mice and rats. Intriguingly, the analysis uncovered previously uncharacterized cell populations, including Kcnj3-expressing astrocytes, neurons co-expressing Th and Cck, and a population of leptin receptor-expressing neurons in the rat area postrema, which also express the progenitor marker Pdgfra. These findings suggest species-specific differences in appetite regulation. This study provides a valuable resource for investigating the intricate cellular landscape of the DVC and its role in metabolic control, with potential implications for refining obesity treatments targeting this hindbrain region.

      In line with previous work published by the PI, the topic is of clear scientific relevance, and the data presented in this manuscript are both novel and compelling. Additionally, the manuscript is well-structured, and the conclusions are robust and supported by the data. Overall, this study significantly enhances our understanding of the DVC and sheds light on key differences between rats and mice.

      I have reviewed the revised manuscript and am pleased to confirm that the authors have addressed my previous comments and concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the expert reviewers for their careful consideration of our manuscript and the feedback to help us strengthen our work. Please find a response to each reviewer’s comments below. We have included the original text from the reviewer in unbolded text and our response, immediately below, in bold text for clarity. 

      Reviewer #1:

      (1) Appetite is controlled, not regulated; please reword throughout.

      The reviewer raises a valid point that we have misused the word “regulate” in certain instances and “control” would be more accurate term. We have made adjustments throughout the manuscript.

      (2) One minor point that would further strengthen the data is a more distinct analysis of receptors that are characteristic of the different populations of neuronal and non-neuronal cells; this part could be improved. 

      We thank the reviewer for this suggestion as we had not directly compared metabolicallyrelevant peptides/receptors between the mouse and rat DVC. We have included a list of selected receptors and neuropeptides expression (see Figure S13) for neuronal cells in mouse and rat. We have included this figure as a new supplement. There are some interesting insights from this data, including the relatively broad expression of Lepr in the rat compared with the mouse and the absence of proglucagon expressing neurons within the rat DVC.  

      Reviewer #2:

      (1) In some of the graphs, the label AP/NTS is used, but DVC would be more appropriate.

      We have reviewed the figures and legends to ensure appropriate use of DVC. We thank the reviewer for bringing this oversight to our attention.  

      (2) Line 124, p7 - Sprague Dawley RATS

      We have changed the text to “Sprague Dawley rats” 

      (3) Line 132, p7 - The phrase "were provided with given access to food" needs grammatical correction.

      We agree the text was poorly written. The sentence has been corrected to: “Wild-type Sprague

      Dawley rats (Charles River) were provided with ad libitum access to food (Purina Lab Diet

      5001) and water in temperature-controlled (22°C) rooms on a 12-hour light-dark cycle with daily health checks.” We have also reviewed the entire manuscript and made additional amendments where necessary.  

      (4) Page 15 - Mention that GFAP is a marker for astrocytes. Additionally, correct the typo "gfrap".

      We have corrected the misspelling of “Gfap” within the text. We appreciate the reviewer’s comment that there is value in communicating to the nonexpert reader that GFAP is a marker for astrocytes, however, as our data and that from other snRNA-Seq studies show that Gfap mRNA only labels a subset of astrocytes, our preference is to refrain from stating this. Our data suggests the sole use of Gfap as an astrocyte marker will not reflect the true astrocyte population.  

      (5) Line 432, p15 - What was the rationale for selecting clusters 23, 26, and 27?

      We chose to perform subclustering on these clusters because they displayed multiple cell identities when surveyed for the 473 marker genes as described in Methods 2.6. In order to separate these, the granularity was increased in them by sub-clustering.

      (6) Line 533, p18 - only 5 out of 34 neurons express GFRAL, which makes the language used a little bit misleading. As per the comment above, I would specify that only a subset (X%) of neurons express GFRAL, and apply the same approach for other markers.

      We thank the reviewer for raising this point. We agree the text, as written, was an oversimplification. We adjusted the text as recommended: that a subset (~15%) express detectable Gfral mRNA but is likely an underrepresentation due to the challenges in detecting lowly expressed transcripts such as Gfral.  

      (7) Line 547, p18 - This statement appears to refer to rat data specifically, rather than rodent data in general.

      The text has been corrected. 

      (8) Section 3.6 - The discussion on meal-related transcriptional programs in the murine DVC does not mention Figure S10A and B.

      We thank the reviewer for the observation. It is true that we do not discuss this figure. Fig10S is the integration of samples in treeArches, a necessary step to build the hierarchy in python so the learning algorithm uses only genes that are related to identity and not treatment, we obtained the same overlap of samples when we used R to assign identities. This figure demonstrates our integration was successful because it is only considering genes that are not-treatment related to establish identities, those which are expressed by cells regardless of their response to any treatment. For the meal-related analysis, we were interested in the genes that are changed by treatment, and this is why the analysis differed. We have included a sentence in the methods to clarify this point that states: " This sample integration was done to ensure that inter-sample variations were removed for the cell identity steps."

      (9) Page 5, citation 10 - the author cited a clinical trial for glucagon and GLP-1 receptor dual agonist survodutide for "DVC neurons' role in appetite and energy balance stems from their role as therapeutic targets for obesity". A more appropriate citation (such as a review) would be preferable.

      We appreciate the suggestion by the reviewer. We have updated our references to reflect a recent manuscript from the Alhadeff group which demonstrates the DVC acts as the target of GLP1-based therapies. We have also included a review as suggested 10.1038/s42255-02200606-9.

      (10) Line 52, p5 - a citation of obesity is needed, as the current ref only pertains to cancer cachexia.

      We have included a reference for obesity.  

      (11) In the discussion, it would be valuable to elaborate on the potential significance of DVCspecific glial cells (perhaps at the end of the second paragraph?).

      We thank the reviewer for this suggestion. Our discovery of a DVC-specific astrocyte transcriptional profile was underrepresented within the discussion. We have attempted to expand this discussion on the suspected roles for these DVC-specific astrocytes. Much of this discussion is based on the distinct localization pattern of Gfap mRNA in the DVC (see Image on Allen Brain ISH) which shows dense signal at the boundary of the AP and NTS. As astrocytes have well established roles in maintaining BBB integrity, it is our speculation that this is a major role of these cells. However, functional studies will be critical to assess the roles of these astrocytes in DVC biology.  

      (12) Line 683, p22 - Consider adding PMID: 38987598 which describes the dissociable GLP-1R circuits.

      We appreciate this recommendation – we have included this reference.  

      (13) The authors suggest that a possible explanation for the discrepancy between snRNA-Seq and in situ hybridization data is that Agrp and Hcrt mRNA reads in snRNA-Seq overwhelmingly mapped to non-coding regions. To what extent could this limitation affect other genes included in the current analyzed 10x datasets?

      As shown by Pool and cols. (https://doi.org/10.1038/s41592-023-02003-w) including intronic reads improves sensitivity and more accurately reflects endogenous gene expression. Therefore, including intronic reads is considered more of a strength than a limitation and is now default in platforms such as CellRanger. While including intronic reads for mapping snRNA-Seq data, we would advise corroboration of snRNA-Seq findings with published literature or detection of coding mRNA or protein. In our case, the detection of hypothalamic neuropeptide via snRNA-Seq data could not be verified by performing in situ hybridizations using probes that detect exons.  Therefore, Hcrt and Agrp having only intronic reads suggest a regulatory (reviewed in https://doi.org/10.3389/fgene.2018.00672) rather than a coding role in the DVC.

      (14) Given the manuscript's focus on feeding and metabolism, I believe a more detailed description and comparison of the transcription profile of known receptors, neurotransmitters, and neuropeptides involved in food intake and energy homeostasis between mice and rats would add value. Adding a curated list of key genes related to feeding regulation would be particularly informative.

      A similar request was made by reviewer #1. Please see the full response above. Briefly, we have performed additional analysis of the mouse and rat DVC data and included this data as an additional supplemental figure (Figure S13).  

      (15) Line 479-482, p17 - It would be helpful if the authors could quantify (e.g., number and/or percentage) the extent of TH and CCK co-expression.

      We have amended the text of the manuscript to include quantification of Cck and Th colocalization.  According to our snRNA-seq data, out of the 764 Th-expressing neurons, 80 coexpress Cck in the mouse (~10%). The Cck-expressing cells are more numerous, 3,821 in total.  

      (16) The number of animals used differs significantly between species, which the authors acknowledge as a limitation in the discussion. Since the authors took advantage of previously published mouse data sets (Ludwig and Dowsett data sets), I wonder if the authors could compare/integrate any rat data set currently available in rats as well to partially address the sample size disparity.

      We agree with the review that our rat database is considerably smaller than our mouse database, making comparisons between rat and mouse DVC challenging. We attempted to increase the size of our rat DVC atlas by incorporating publicly available rat DVC snRNA-Seq data (Reiner et al 2022). However, we found several issues with the quality of this data including low UMIs/cell and gene #/cell. For these reasons, we decided against merging these two datasets. So while relatively small, our rat DVC atlas uses high quality data and serves as a valuable starting point. By introducing TreeArches as a method to relatively easily incorporate new snRNA-Seq data into our own, it is our hope that future studies will do so and thus expand the rat DVC atlas we have built.    

      (17) In the Materials and Methods section, LiCl is mentioned as one of the treatment conditions; however, very little corresponding data are presented or discussed. Please include these results and elaborate on the rationale for selecting LiCl over other anorectic compounds.

      The reviewer is correct, some of the tissues used in this study were from animals treated with LiCl prior to euthanasia. Our intent was to contrast the transcriptional effects induced by LiCl ( an anorectic agent with aversive properties) with refeeding (a naturally rewarding and satiating stimuli). However, upon analyzing the data, we found very few transcriptional changes induced by LiCl. It is unclear to us whether this was a technical failure in the experiment and so did not elaborate on the results.  

      Reviewer #3 (Recommendations for the authors):

      (1) The use of both sexes is indicated in the discussion, but methods and results do not address sex distribution in the investigated groups. Also, the groups could be more clearly described, e.g., the size of the 2 hour refeeding mouse group varies from n=10 to n=5.

      We have clarified the text, in line with the reviewer’s suggestion. There were two cohorts of fasted/ refed mice (n=5 each), so in the manuscript methods it is stated as n=10 because of this. The fasted-only group, which was not refed before euthanasia is a separate group, n=5.

      (2) Page 20, the last sentence needs to be reworded.

      We thank the reviewer for this recommendation. The text has been amended to improve clarity of the sentence. 

      (3) Page 22, lines 691-692 - this sentence needs to be reworded.

      We thank the reviewer for this comment. The offending sentences have been amended.  

      (4) While the authors find transcriptional changes in all neuronal and non-neuronal cell types, which is interesting, the verification of known transcriptional changes (e.g., cFos) is unaddressed. cFos is a common gene upregulated with refeeding that was surprisingly not investigated, even though this should be a strong maker of proper meal-induced neuronal activation in the DMV. This is a missed opportunity either to verify the data set or to highlight important limitations if that had been attempted without success.

      This is a highly salient point made by the reviewer. Including Fos expression serves as an internal validation of our refeeding condition and the absence of Fos mRNA levels from the original manuscript was an oversight on our part. As shown in our volcano plot, between ad libitum fed and refed mice, there are two significantly Fos-associated genes upregulated in the refed group. Therefore, we are confident that the snRNA-Seq analysis accurately captured rapid changes in response to refeeding in the DVC. Only genes differentially expressed (log2 Fold-change >0.5 per group) were considered in the analysis. NS= non-significant.

      Author response image 1.

      (5) The focus on transmitter classification is highlighted, but surprisingly, the well-accepted distinction of GABAergic neurons by Slc32a1 was not used, instead, Gad1 and Gad2 were used as GABAergic markers. While this may be proper for the DMV, given numerous findings that Gad1/2 are not proper markers for GABAergic neurons and often co-expressed in glutamatergic populations, this confound should have been addressed to make a case if and why they would be proper markers in the DMV.

      The reviewer raises an important point. Indeed, there are discrepancies in expression between the Gad1/2 genes and Slc32a1 gene in other data sets. To analyze this within our data set, we examined the mainly GABAergic magnaclass 1 (see Slc32a1 UMAP plot below).  In magnaclass 1, only 5% and 3% of all neurons exclusively express solely Slc32a1 without either Gad1 or Gad2, respectively. In line with the reviewer’s comment, we found that 54% of neurons express either Gad1 or Gad2 but had no detectable Slc32a1. While our failure to detect more cells that co-express Slc32a1 and Gad genes may be partially due to the low expression of Slc32a1, it is also very likely that the DVC, like other brain regions, contains neurons that express the Gad enzymes without co-expression of Slc32a1.  

      This was very much the case with the GLP1 cell cluster, which we identified as the population which had the highest co-expression of excitatory and inhibitory markers. When we refined this analysis to look at expression of excitatory markers with Slc32a1 (and not other inhibitory genes), there was a marked reduction in the proportion of GLP1 neurons meeting this criterion. We find this is mainly due to the GLP1 cells expressing Gad2 (see plots below). We still find that there are some GLP1-expressing neurons that express excitatory markers and Slc32a1 and that the GLP1 neurons have a higher proportion of these co-expressing cells than other cell types.  

      We have extended our results section to reflect this and thank the reviewer for recommending this analysis.  

      Author response image 2.

      Slc32a1 expression across all neurons.  

      Author response image 3.

      Proportion of neurons in all cell identities expressing glutamatergic markers alone (dark green), Slc32a1 alone (light green), both glutamatergic markers and Slc32a1 (purple) or expressing neither Slc32a1 or glutamatergic markers  (grey).  

      Author response image 4.

      Balloon plot of Slc32a1, Gad1 and Gad2 across cell types. The GLP1-expressing neurons express Gad2 but minimal Slc32a1.  

      (6) The Pdgfra IHC as verification is great, but images are not very convincing in distinguishing the 2 (mouse) or 3 (rat) classes of cells. Why not compare Pdgfra and HuC/D co-localization by IHC and snRNAseq data (using the genes for HuC/D) in the mouse and in the rat? That would also clarify how specific HuC/D is for DMV neurons, or if it may also be expressed in non-neuronal populations.

      In agreement with the suggestion by the reviewer, we reanalyzed the snRNA-Seq data to identify the extent of the co-expression of HuC/HuD (i.e. Elavl3 and Elavl4 genes, respectively) in Pdgfra-expressing neurons. The gene expression of the 34 rat neurons belonging to this group are shown in the following heatmap in which each column represents one neuron. As shown, most neurons co-express Pdgfra and either HuC or HuD gene. In addition, we shown the UMAP plots of the rat neurons showing expression of the same genes regardless of the neuronal identity assigned. The Pdgfra neurons are visible in darker blue in the last UMAP plot. It's important to note that HuD is a more specific neuronal marker as shown in the table with the average expression of Elavl3/4 genes, since HuC is expressed by glial cells, specially OPCs and oligodendrocytes. As the HUC/D antibody detects both proteins, this complicates the interpretation of the immunofluorescent staining. While, the snRNA-Seq data suggests these Pdgfra expressing cells are indeed neurons (albeit a rare population), we aim to confirm this in separate studies.  

      Author response image 5.

      Author response image 6.

      Average expression (log-normalized counts) of HuC/D by layer 1 cell identity in the rat cells:

      Author response table 1.

      (7) The importance of sub-clustering for clusters 23, 26, and 27 is not immediately clear. Does this have any relevance to the mouse vs. rat data? Or fed, fast, refeeding data sets? Or is it just to show the depth that can be achieved?

      We appreciate that our justification was not clear within the manuscript. We have clarified our rationale below but briefly, in each case distinct transcriptional profiles were observed, and we pursued this by performing sub-clustering.   

      Cluster 23 was subclustered as it was found to contain both pre-myelinating and a subset of myelinating oligodendrocytes, therefore, to label them effectively in R instead of cell by cell, those subclusters showing pre-myelinating oligodendrocyte markers were instructed to be labeled as such in the dataset. The remaining cells were labeled as mature oligodendrocytes.

      A similar approach was taken for cluster 27 which contained pericytes, endothelial and smooth muscle cells (Figure S5).

      In the case of cluster 26, it was possible to find two subclusters of fibroblasts when mapping markers, so they were sub-clustered to instruct in R to label a group with one identity and the other, with the other identity. Therefore, the sub-clustering was done as an aid to label the different identities found through markers mapping (Table S5) in the first clustering round.

      All labels were transferred from mouse to rat data using treeArches, including those resulting from the sub-clustering of these clusters. Because this was done to establish identity, it should not be relevant for treatment analyses (e.g. fasted, refed) since they are built from markers that don't change by conditions but remain as identity markers. Indeed, our dataset has an even distribution of these subclusters among samples.

    1. eLife Assessment

      Cryptovaranoides, an end-Triassic animal (just over 200 Ma old), was originally described as a possibly anguimorph squamate, i.e., more closely related to snakes and some extant lizards than to other extant lizards, making Squamata much older than previously thought and providing a new calibration date inside it. Following a rebuttal and a defense, this fourth important contribution to the debate makes a meticulous and solid argument that Cryptovaranoides is not a squamate. However, further comparisons to potentially closely related animals would greatly benefit this study, and parts of the text require clarification.

    2. Reviewer #1 (Public review):

      In the Late Triassic and Early Jurassic (around 230 to 180 Ma ago), southern Wales and adjacent parts of England were a karst landscape. The caves and crevices accumulated remains of small vertebrates. These fossil-rich fissure fills are being exposed in limestone quarrying. In 2022 (reference 13 of the article), a partial articulated skeleton and numerous isolated bones from one fissure fill of end-Triassic age (just over 200 Ma) were named Cryptovaranoides microlanius and described as the oldest known squamate - the oldest known animal, by some 20 to 30 Ma, that is more closely related to snakes and some extant lizards than to other extant lizards. This would have considerable consequences for our understanding of the evolution of squamates and their closest relatives, especially for their speed and absolute timing, and was supported in the same paper by phylogenetic analyses based on different datasets.

      In 2023, the present authors published a rebuttal (reference 18) to the 2022 paper, challenging anatomical interpretations and the irreproducible referral of some of the isolated bones to Cryptovaranoides. Modifying the datasets accordingly, they found Cryptovaranoides outside Squamata and presented evidence that it is far outside. In 2024 (reference 19), the original authors defended most of their original interpretation and presented some new data, some of it from newly referred isolated bones. The present article discusses anatomical features and the referral of isolated bones in more detail, documents some clear misinterpretations, argues against the widespread but not justifiable practice of referring isolated bones to the same species as long as there is merely no known evidence to the contrary, further argues against comparing newly recognized fossils to lists of diagnostic characters from the literature as opposed to performing phylogenetic analyses and interpreting the results, and finds Cryptovaranoides outside Squamata again.

      Although a few of the character discussions and the discussion of at least one of the isolated bones can probably still be improved (and two characters are addressed twice), I see no sign that the discussion is going in circles or otherwise becoming unproductive. I can even imagine that the present contribution will end it.

    3. Reviewer #2 (Public review):

      Congratulations on this thorough manuscript on the phylogenetic affinities of Cryptovaranoides. Recent interpretations of this taxon, and perhaps some others, have greatly changed the field's understanding of reptile origins- for better and (likely) for worse.

      This manuscript offers a careful review of the features used to place Cryptovaranoides within Squamata and adequately demonstrates that this interpretation is misguided, and therefore reconciles morphological and molecular data, which is an important contribution to the field of paleontology. The presence of any crown squamate in the Permian or Triassic should be met with skepticism, the same sort of skepticism provided in this manuscript.

      I have outlined some comments addressing some weaknesses that I believe will further elevate the scientific quality of the work. A brief, fresh read‑through to refine a few phrases, particularly where the discussion references Whiteside et al. could also give the paper an even more collegial tone.

      This manuscript can be largely improved by additional discussion and figures, where applicable. When I first read this manuscript, I was a bit surprised at how little discussion there was concerning both non-lepidosauromorph lepidosaurs as well as stem-reptiles more broadly. This paper makes it extremely clear that Cryptovaranoides is not a squamate, but would greatly benefit in explaining why many of the characters either suggested by former studies to be squamate in nature or were optimized as such in phylogenetic analyses are rather widespread plesiomorphies present in crownward sauropsids such as millerettids, younginids, or tangasaurids. I suggest citing this work where applicable and building some of the discussion for a greatly improved manuscript. In sum:

      (1) The discussion of stem-reptiles should be improved. Nearly all of the supposed squamate features in Cryptovaranoides are present in various stem-reptile groups. I've noted a few, but this would be a fairly quick addition to this work. If this manuscript incorporates this advice, I believe arguments regarding the affinities of Cryptovaranoides (at least within Squamata) will be finished, and this manuscript will be better off for it.

      (2) I was also surprised at how little discussion there was here of putative stem-squamates or lepidosauromorphs more broadly. A few targeted comparisons could really benefit the manuscript. It is currently unclear as to why Cryptovaranoides could not be a stem-lepidosaur, although I know that the lepidosaur total-group in these manuscripts lacks character sampling due to their scarcity.

      (3) This manuscript can be improved by additional figures, such as the slice data of the humerus. The poor quality of the scan data for Cryptovaranoides is stated during this paper several times, yet the scan data is often used as evidence for the presence or absence of often minute features without discussion, leaving doubts as to what condition is true. Otherwise, several sections can be rephrased to acknowledge uncertainty, and probably change some character scorings to '?' in other studies.

    4. Reviewer #3 (Public review):

      Summary:

      The study provides an interesting contribution to our understanding of Cryptovaranoides relationships, which is a matter of intensive debate among researchers. My main concerns are in regard to the wording of some statements, but generally, the discussion and data are well prepared. I would recommend moderate revisions.

      Strengths:

      (1) Detailed analysis of the discussed characters.

      (2) Illustrations of some comparative materials.

      Weaknesses:

      Some parts of the manuscript require clarification and rewording.

      One of the main points of criticism of Whiteside et al. is using characters for phylogenetic considerations that are not included in the phylogenetic analyses therein. The authors call it a "non-trivial substantive methodological flaw" (page 19, line 531). I would step down from such a statement for the reasons listed below:

      (1) Comparative anatomy is not about making phylogenetic analyses. Comparative anatomy is about comparing different taxa in search of characters that are unique and characters that are shared between taxa. This creates an opportunity to assess the level of similarity between the taxa and create preliminary hypotheses about homology. Therefore, comparative anatomy can provide some phylogenetic inferences. That does not mean that tests of congruence are not needed. Such comparisons are the first step that allows creating phylogenetic matrices for analysis, which is the next step of phylogenetic inference. That does not mean that all the papers with new morphological comparisons should end with a new or expanded phylogenetic matrix. Instead, such papers serve as a rationale for future papers that focus on building phylogenetic matrices.

      (2) Phylogenetic matrices are never complete, both in terms of morphological disparity and taxonomic diversity. I don't know if it is even possible to have a complete one, but at least we can say that we are far from that. Criticising a work that did not include all the possibly relevant characters in the phylogenetic analysis is simply unfair. The authors should know that creating/expanding a phylogenetic matrix is a never-ending work, beyond the scope of any paper presenting a new fossil.

      (3) Each additional taxon has the possibility of inducing a rethinking of characters. That includes new characters, new character states, character state reordering, etc. As I said above, it is usually beyond the scope of a paper with a new fossil to accommodate that into the phylogenetic matrix, as it requires not only scoring the newly described taxon but also many that are already scored. Since the digitalization of fossils is still rare, it requires a lot of collection visits that are costly in terms of time.

      (4) If I were to search for a true flaw in the Whiteside et al. paper, I would check if there is a confirmation bias. The mentioned paper should not only search for characters that support Cryptovaranoides affinities with Anguimorpha but also characters that deny that. I am not sure if Whiteside et al. did such an exercise. Anyway, the test of congruence would not solve this issue because by adding only characters that support one hypothesis, we are biasing the results of such a test.

      To sum up, there is nothing wrong with proposing some hypotheses about character homology between different taxa that can be tested in future papers that will include a test of congruence. Lack of such a test makes the whole argumentation weaker in Whiteside et al., but not unacceptable, as the manuscript might suggest. My advice is to step down from such strong statements like "methodological flaw" and "empirical problems" and replace them with "limitations", which I think better describes the situation.

    1. eLife Assessment

      This revised manuscript provides fundamental findings on how the mouse barrel cortex connects to the dorsolateral striatum, uncovering that inputs from discrete whisker cortical columns are convergent and SPN-specific, but topographically organized at the population level. The evidence supporting this claim is compelling, demonstrating that SPNs uniquely integrate sparse input from variable stretches across the barrel cortex. The study would be of interest to basal ganglia and sensory-motor integration researchers.

    2. Reviewer #1 (Public review):

      Summary:

      By applying the laser scanning photostimulation (LSPS) approach to a novel slice preparation, the authors aimed to study the degree of convergence and divergence of cortical inputs to individual striatal projection neurons (SPNs).

      Strengths:

      The experiments were well-designed and conducted, and data analysis was thorough. The manuscript was well written and related work in the literature was properly discussed. This work has the potential to advance our understanding of how sensory inputs are integrated into the striatal circuits.

    3. Reviewer #2 (Public review):

      Summary:

      How corticostriatal synaptic connectivity gives rise to SPN encoding of sensory information is an important and currently unanswered question. The authors utilize a clever slice preparation in combination with electrophysiology and glutamate uncaging to dissect the synaptic connectivity between barrel cortex and individual striatal SPNs. In addition to mapping connectivity across major anatomical axes and cortical layers, the authors provide data showing that SPNs uniquely integrate sparse input from variable stretches across barrel cortex.

      Strengths:

      The methodology shows impressive rigor and the data robustly support the authors conclusions. Overall, the manuscript addresses its core question, provides valuable insights into corticostriatal architecture, and is a welcomed addition to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors explored how individual dorsolateral striatum (DLS) spiny projection neurons (SPNs) receive functional input from whisker-related cortical columns. The authors developed and validated a novel slice preparation and method to which they applied rigorous functional mapping and thorough analysis. They found that individual SPNs were driven by sparse, scattered cortical clusters. Interestingly, while the cortical input fields of nearby SPNs had some degree of overlap, connectivity per SPN was largely distinct. Despite sparse, heterogeneous connectivity, topographical organization was identified. The authors lastly compared direct (D1) vs. indirect (D2) pathway cells, concluding that overall connectivity patterns were the same, but D1 cells received stronger input from L6 and D2 cells from L2/3. The paper thoughtfully addresses the question of whether barrel cortex broadly or selectively innervates SPNs. Their results indicate selective input that is loosely topographic. Their work deepens the understanding of how whisker-related somatosensory signals can drive striatal neurons.

      Strengths:

      Overall this is a carefully conducted study, and the major claims are well-supported. The use of a novel ex vivo slice prep that keeps relevant corticostriatal projections intact allows for careful mapping of the barrel cortex to dorsolateral striatum SPNs. Careful reporting of both columnar and layer position, as well as postsynaptic SPN type (D1 or D2) allows the authors to uncover novel details about how the dorsolateral striatum represents whisker-related sensory information.

      Weaknesses:

      Most technical weaknesses have now been addressed in the text.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work focuses on the connection strength of the corticostriatal projections, without considering the involvement of synaptic plasticity in sensory integration.

      Thank you for raising this point. Indeed, sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. In addition, it is true that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354: 

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      Reviewer #2 (Public review):

      A few minor changes to the figures and text could be made to improve clarity.

      We thank you for having taken the time to indicate where changes could benefit the paper. We followed your recommendations. 

      Reviewer #3 (Public review):

      (1) Several factors may contribute to an underestimation of barrel cortex inputs to SPNs (and thus an overestimate of the input heterogeneity among SPNs). First, by virtue of the experiments being performed in an acute slice prep, it is probable that portions of recorded SPN dendritic trees have been dissected (in an operationally consistent anatomical orientation). If afferents happen to systematically target the rostral/caudal projections of SPN dendritic fields, these inputs could be missed. Similarly, the dendritic locations of presynaptic cortical inputs remain unknown (e.g., do some inputs preferentially target distal vs proximal dendritic positions?). As synaptic connectivity was inferred from somatic recordings, it's likely that inputs targeting the proximal dendritic arbor are the ones most efficiently detected. Mapping the dendritic organization of synapses is beyond the scope of this work, but these points could be broached in the text.

      Thank you for this analysis. The positions of S1 spines have been mapped on the SPN dendritic arbor by the group of Margolis (B.D. Sanabria et al., ENeuro 2024,10.1523/ENEURO.0503-23.2023). They observed that S1 spines were at 80 % on dendrites but with a specific distribution, on average rather close to the soma.  In this study, S1 spines did not exhibit a specific distribution that would systematically hinder their detection in a slice. But, it remains that the position in the dendritic arbor where an S1 input is received does indeed impact its detection in somatic recordings. We modified the discussion as follows, line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods). More significantly, our mapping only included projections from neuronal somata located within the S1 barrel field in the slice: projections from cortical columns outside the slice were not stimulated. For this reason, our study characterized connectivity patterns rather than the full extent of connectivity with the barrel cortex.”

      We explain our estimation of truncated S1 contacts in the Methods, line 434:

      “To estimate the loss of S1 synaptic contacts caused by slice preparation, we modeled the SPN dendritic field as a sphere centered on the soma. S1 synapses were at 80 % distributed radially along dendrites, according to the specific distribution described by Sanabria et al. (2024). The simulation also incorporated the known distribution of SPN dendritic length as a function of distance from the soma (Gertler et al., 2008). Finally, it assumed that synapse placement was isotropic, with equal probability in all directions from the soma. Truncation was simulated by removing a spherical cap at one pole of the sphere, reflecting the depth of our recordings (beyond 80 μm). Based on this simulation, the loss of S1 inputs was < 10 %.”

      (2) In general, how specific (or generalizable) is the observed SPN-specific convergence of cortical barrel cortex projections in the dorsolateral striatum? In other words, does a similar cortical stimulation protocol targeted to a non-barrel sensory (or motor) cortex region produce similar SPN-specific innervation patterns in the dorsolateral striatum?

      This is an interesting question that could be addressed using the LSPS approach in areas for which ex vivo preparations have been designed to maintain the integrity of the corticostriatal projections, such as A1, M1 and S2.  

      We included this point in the discussion, line 299: 

      ” The speckled connectivity pattern of individual SPNs, arising from the abundant and diffuse cortical innervation in the DLS, suggests that somatosensory corticostriatal synapses are established through a selective and/or competitive process. It is important to determine whether this sparse innervation of SPNs by S1 is a characteristic shared with other projections. In particular, it will be interesting to test this hypothesis on the auditory projections targeting the posterior striatum, where neurons exhibit clear tone frequency selectivity (Guo et al., 2018).”

      (3) In general, some of the figure legends are extremely brief, making many details difficult to infer. Similarly, some statistical analyses were either not carried out or not consistently reported.

      We thank you for having taken the time to indicate where changes could benefit the paper. We have followed your recommendations. 

      Reviewer #1 (Recommendations for the authors):

      A few limitations should be discussed in the manuscript:

      (1) The manuscript should mention that most corticostriatal synapses are formed at the dendritic spines of the SPNs, not their cell bodies. This is particularly important regarding the analysis and interpretation of the data in Figure 4.

      Thank you for this comment. This characteristic is important with regards to a limitation of electrophysiological recordings. This is now discussed:

      Line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods).“

      Line 313:

      [...],, we found that overlaps between the connectivity maps of SPNs were rare and, when present, involved only a small fraction of the connected sites. This indicates that neighboring SPNs predominantly integrated distinct inputs from the barrel cortex, although it is possible that overlapping inputs received in distal dendrites were not all detected”

      (1) SPNs show up- and down-states in vivo, which were not mimicked by the present study since all cells were held at - 80 mV (Line 364) and recorded at room temperature (Line 368). It should be discussed how the conclusion of the present work may be affected by the up/down states of SPNs in vivo.

      Thank you for raising this point. Indeed, our experimental conditions were not designed to capture the effects of network oscillatory activity. Instead, LSPS conditions were optimized to reveal monosynaptic connectivity between neurons in S1 and their postsynaptic targets. These optimizations include the use of a high concentration of extracellular divalents (4 mM Ca<sup>2+</sup> and Mg<sup>2+</sup>) to generate robust yet moderate and spatially-restricted stimulations of cortical cells and reliable neurotransmitter release (Shepherd, Pologruto and Svoboda, Neuron 2003; 10.1016/s0896-6273(03)00152-1; in our study, see Fig. 1D  and Suppl Fig. 2). Investigating the pre- and postsynaptic modulations of the corticostriatal coupling by up- and down-states would require specific conditions. 

      The conclusion now acknowledges that functional connectivity is subject to plasticity in general, line 358:

      “The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling.”

      (2) In addition to population-level integration (Line 337), sensory integration is likely to involve synaptic plasticity (like via NMDARs), which was not studied in the present work

      Thank you for raising this point. Indeed, we agree that sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. We also agree that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354:

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      (3) The potential corticostriatal connectivity may be underestimated due to loss of axonal branches during slice resection, and this might contribute to the conclusion of "sparse connectivity". Whether the author has considered performing LSPS studies within the striatum (i.e., stimulating ChR2-expressing cortical axon terminals) and whether this experiment may consolidate the conclusion of the present work.

      We appreciate the suggestion to employ Subcellular Channelrhodopsin-2-Assisted Circuit Mapping (sCRACM) to study the density of S1 spines on SPNs dendritic arbor. If ChR2 is broadly expressed in S1, this approach would likely increase spine detection, as spines contacted by presynaptic neurons located inside and outside the slice would now be activated. If ChR2 expression could be restricted to the whisker columns present in our preparation, enhanced detection could still occur, but in this case, it would reflect the activation of spines contacted by specific ChR2<sup>+</sup> axonal branches that exit and re-enter the slice to form synapses on the recorded SPN. The anatomy of corticostriatal axonal arbors suggest convoluted axonal trajectories could be relatively rare (T. Zheng and C.J. Wilson, J Neurophysiol. 2001; 10.1152/jn.00519.2001; M. Lévesque et al., Brain Res. 1996; 10.1016/0006-8993(95)01333-4).  

      Moreover, it is important to remember that sCRACM does not generate connectivity maps between 2 structures, but maps of spines on dendritic arbors (Petreanu L.T. et al., Nature 2009; 10.1038/nature07709.). Precise localization of presynaptic cell bodies was key for the present study, as it enabled distinguishing between different connectivity patterns and between different degrees of convergence of inputs from adjacent S1 cortical columns present in the slice (schematized in Fig. 1). Distinguishing these inputs using the stimulation of axon terminals would require the possibility to express one distinct opsin in each whisker column (or each cortical layer, depending on the axis of investigation). This is an exciting perspective but the technology is not yet available to our knowledge. 

      To emphasize our reasons for using LSPS, we revised the final paragraph of the Introduction, line 69: 

      “LSPS enabled precise mapping of corticostriatal functional connectivity by identifying cortical sites where stimulation evoked synaptic currents in the recorded SPNs, thereby localizing the cell bodies of their presynaptic neurons. This approach allowed us to determine both the cortical column and layer of origin within the barrel field in the slice for each SPN input.”

      Reviewer #2 (Recommendations for the authors):

      (1)  Figure 2F: SPN and cortical regions - both are shown in green. The distinction between the two would be clearer if SPNs were made a different color.

      Done

      (2)  Figure 2H: Based on their data, the authors conclude that since EPSCs in SPNs had small amplitudes (~40pA), only one or a few presynaptic cortical neurons (< 5) were activated by uncaging. It is not clear how this number was estimated. Either this statement should be qualified with data or citations provided to support it.

      We thank you for noticing it. We modified this part as follows, line 105:

      “Based on known amplitudes of spontaneous and miniature EPSCs in SPNs (10-20 pA on average; Kreitzer and Malenka, 2007; Cepeda et al., 2008; Dehorter et al., 2011; Peixoto et al., 2016), this finding is consistent with the presence of only one or a few presynaptic cells (≤ 5) at each connected site of the map.”

      (3) Figure 2I: The top graph is difficult to understand without already seeing the lower plot. Moving it below or to the side would help the reader follow the data more easily.

      done

      (4) Figure 3D: In Line 162, the authors state, " Furthermore, SPNs receiving input from a single column were often located near others receiving input from multiple ones (Figure 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases." However, Figure 3D does not show spatial information about SPNs relative to each other. This data should be added or the statement adjusted to reflect what is shown in the panel.

      Corrected as follows, line 167:

      “Furthermore, SPNs receiving input from a single column were often located in slices where other cells received input from multiple ones (Fig. 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases.”

      (5) Figure 3F: Are the authors attempting to show how cluster number, cluster width, and connectivity gaps contribute to input field width? If so, this could be clarified by flipping the x- and y-axes so that the input field width is the y-axis in each case. Additionally, the difference between black and white points should be stated (or, if there is no difference, made to be the same). The significance of the dotted red line vs. the solid red lines should also be stated in the figure legend.

      These plots illustrate how cluster number, cluster width, and ratio of connectivity gaps over total length vary as a function of input field width. As expected, wider input fields contain more clusters (top). However, the overall density of connected sites does not increase with input field width, as indicated by a higher ratio of connectivity gaps over total length (bottom).

      This suggests the presence of a mechanism that regulates the connectivity level of individual SPNs (mentioned in the discussion). We prefer this orientation because the flipped one makes a cluttered panel due to different X axis labels. Symbols and lines were corrected. The correlation coefficients and statistics are now indicated in the panels and in the legend.

      (6) Figure 3H: The schematic is very useful for highlighting the core conclusions and is greatly appreciated. The pie charts are a bit hard to see and could be replaced with the percentages stated simply as text within the figure. It would also help to label the panel as "Summary," so readers can quickly identify its purpose.

      Done

      (7) Figures 4B-D: To clarify the overall percentage, the maximum for the y-axis should be set to 100% in each panel.

      Done

      Reviewer #3 (Recommendations for the authors):

      (1) Though mostly minor, several sentences/statements in the manuscript are confusing or overstated. For example:

      a. Lines 62-63: "Studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs" is a broad statement that should be qualified.

      We changed this sentence for: 

      “Electrophysiological studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs, both in vivo and ex vivo (Reig and Silberberg, 2014 ; Filipović et al., 2019 ; Kress et al., 2013 ; Parker et al., 2016).”

      b. Lines 118-119: "EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Figure 2H), suggesting that L5a dominated these other layers thanks to a greater connectivity with SPNs principally." Here, the word "connectivity" is vague and could easily be misunderstood. Connectivity could refer to the amplitude of corticostriatal EPSCs, which the authors stated are not different between L2/3-L5b. Presumably, connectivity here refers to % of connected SPNs, but for the sake of clarity, the authors should be more explicit, e.g,. "...L5a dominated the other layers because a larger fraction of SPNs received connections from L5a, rather than because L5a synapses were stronger."

      We changed the sentence for (line 122): 

      “EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Fig. 2H), suggesting that L5a dominance over these other layers is primarily due to a higher likelihood of SPNs being connected to it, rather than to stronger synaptic inputs.”

      c. In the Figure 4 legend, (A) says "Four example slices with 2 to 4 recordings. Same as in Figure 2A." Did the authors mean Figure 3A?

      Done

      d.Line 184: Should Figure 4B, C actually be Figure 4D?

      Done

      (2) Line 32: typo in Sippy et al. reference.

      Done

      (3) In Figure 2I, the label "dSPN" is confusing, as in the literature, dSPN often refers to the direct pathway SPN.

      Done

      (4) The y-axes in Figure 3C should be better labeled/explained.

      Fig.3C. Median (red) and 25-75th percentiles (box) of cluster width and spacing, expressed in µm (left Y axis) and number of cortical columns (right Y axis). Labels have been changed in the figure.

      (5)  Lines 150-152: "...45 % of the input fields with several clusters produced no synaptic response upon stimulation." This wording is confusing. It can be inferred that the authors mean "no synaptic response in the gaps between clusters." However, their phrasing omits this crucial detail and reads as though those input fields produce no response at all.

      We changed this sentence for (line 154):

      “Strikingly, regions lacking evoked synaptic responses (i.e., connectivity gaps) made up an average of 45 % of the length of input fields with multiple clusters (maps collapsed along the vertical axis; Fig. 3F, bottom). “

      (6)  Lines 184-186: "DLS SPNs could receive inputs from the same domain in the barrel cortex and yet have patterns of cortical innervation without or little redundancy." This should be rephrased to "with little to no redundancy."

      Done

      (7)  Lines 186-187: "They support a connectivity model in which synaptic connections on each SPNs..." should be revised to "connections to each SPN...".

      Done

    1. eLife Assessment

      In this manuscript, the authors describe a software package for automatic differentiation of action potentials generated by excitatory and inhibitory neurons, acquired using high-density microelectrode arrays. The work is valuable as it offers a tool with the potential to automatically identify these neuron types in vitro. It is solid, as it provides a tool to identify putative excitatory and inhibitory neurons on high-density electrode arrays, which can be used in conjunction with other existing spike sorting pipelines.

    2. Reviewer #1 (Public review):

      Summary:

      The authors note that while many software packages exist for spike sorting, these do not automatically differentiate with known accuracy between excitatory and inhibitory neurons. Moreover, most existing spike sorting packages are for in vivo use, where the majority of electrodes are separated from each other by several hundred microns or more. There is a need for spike sorting packages that can take advantage of high-density electrode arrays where all electrodes are within a few tens of microns from other electrodes. Here, the authors offer such a software package with SpikeMAP, and they validate its performance in identifying parvalbumin interneurons that were optogenetically stimulated.

      Strengths:

      The main strength of this work is that the authors use ground truth measures to show that SpikeMAP can take features of spike shapes to correctly identify known parvalbumin interneurons against a background of other neuron types. They use spike width and peak to peak distance as the key features for distinguishing between neuron types, a method that has been around for many years (Barthó, Peter, et al. "Characterization of neocortical principal cells and interneurons by network interactions and extracellular features." Journal of neurophysiology 92.1 (2004): 600-608.), but whose performance has not been validated in the context of high-density electrode arrays.

      Another strength of this approach is that it is automated - a necessity if your electrode array has 4096 electrodes. Hand-sorting or even checking such a large number of channels is something even the cruellest advisor would not wish upon a graduate student. With such large channel counts, it is essential to have automated methods that are known to work accurately. Hence, the combination of validation and automation is an important advance.

      A nice feature of this work is that with high-density electrode arrays, the spike waveforms appear on multiple nearby electrodes simultaneously. And since spike amplitudes fall off with distance, this allows triangulation of neuron locations within the regular electrode array. Thus, spike correlations between neuron types, or within neuron types, can be plotted as a function of distance. While SpikeMAP is not the first to do this (Peyrache, Adrien, et al. "Spatiotemporal dynamics of neocortical excitation and inhibition during human sleep." Proceedings of the national academy of sciences 109.5 (2012): 1731-1736.), it is a welcome capability of this package.

      It is also good that the code for this package is open-source, allowing a community of people (I expect in vitro labs will especially want to use this) to use the code and further improve it.

      Weaknesses:

      As this code was developed for use with a 4096-electrode array, it is important to be aware of double counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas: First, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code. Second, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      Appraisal:

      This work addresses the need for an automated spike sorting software package for high density electrode arrays. Although no spike sorting software is flawless, the package presented here, SpikeMAP, has been validated on PV interneurons, inspiring a degree of confidence. This is a good start, and further validation on other neuron types could increase that confidence. Groups doing in vitro experiments, where 4096 electrode arrays are more common, could find this system particularly helpful.

      Comments on revised version:

      I appreciate the dialogue that has occurred over this submission. I have seen how the authors have taken into account the issues that I have raised, as well as those brought up by reviewer 2. I am satisfied that the paper has improved and is now a novel and useful contribution in the area of spike sorting.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, entitled "SpikeMAP: An unsupervised spike sorting pipeline for cortical excitatory and inhibitory 2 neurons in high-density multielectrode arrays with ground-truth validation", the authors are presenting spikeMAP, a pipeline for the analysis of large-scale recordings of in vitro cortical activity. According to the authors, spikeMAP not only allows for the detection of spikes produced by single neurons (spike sorting), but also allows for the reliable distinction between genetically determined cell types by utilizing viral and optogenetic strategies as ground-truth validation. While I find that the paper is nicely written, and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons is interesting, spikeMAP does not seem to bring anything new to state of the art solutions, and/or, at least, it would deserve to be properly benchmarked. This is why I would suggest the authors to perform a more intensive comparison with existing spike sorters.

      Strengths:

      The GT recordings with optogenetic activation of the cells, based on the opsins is interesting and might provide useful data to quantify how good spike sorting pipelines are, in vitro, to discriminate between excitatory and inhibitory neurons. Such an approach can be quite complementary with artificially generated ground truth.

      Weaknesses:

      The global workflow of spikeMAP, described in Figure 1, seems to be very similar to the one of [Hilgen et al, 2020, 10.1016/j.celrep.2017.02.038.]. Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters. This is why at the very least, the title of the paper is misleading, because it let the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, w.r.t. spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce to me, or would deserve to be better explained (see other comments after)

      Regarding the putative location of the spikes, it has been shown that center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods such as monopolar triangulation or grid-based convolution might have better performances. Can the authors comment on the choice of Center of Mass as a unique way to triangulate the sources?

      Still in Figure 1, I am not sure to really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What's special with the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii. In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and not really matching state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrodes. If so, this is a really strong assumption that should not be held in the context of spike sorting, because since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration on Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines are not relying on k-means, to avoid any hard coded number of clusters. Can the authors comment on that?

      I'm surprised by the linear decay of the maximal amplitude as a function of the distance from soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the some, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like

      In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none is mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)

      Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs, ... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells is higher than Excitatory ones, while it should be in theory.

      For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518]

      Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about. Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rates patterns for excitatory and inhibitory cells, and thus the authors could test how good they are in discriminating the two subtypes

      Comments on revised version:

      While I must thank the authors for their answers, I still think that they miss an important one, and only partially answering some of my concerns.

      I truly think that SpikeMAP would benefit with a comparison with a state-of-the-art spike sorting pipeline, for example Kilosort. The authors said that they made the sorter modular enough such that only the E/I classification step can be compared. I think this would be worth it, just to be sure that SpikeMAP spike sorting, which might be more simple than other recent solution (with template matching), is not missing some cells, and thus degrading the E/I classification performances. I know that such a comparison is not straightforward, because there is no clear ground truth, but I would still need to be convinced that the sorting pipelines is bringing something, on its own. While there is no doubt that the E/I classification layer can be interesting, especially given the recordings shared by the authors, I'm still a bit puzzled by the sorting step. Thus maybe either a Table, a figure, or even as Supplementary one. Or the authors could try to generate fake GT data with MEArec for example, with putative E/I cells (discriminated via waveforms and firing rates) and show on such (oversimplified) data that SpikeMAP is performing similarly to modern spike sorters. Otherwise, this is a bit hard to judge...

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      Thank you for this comment. We have added a routine to the SpikeMAP to remove highly correlated spikes detected within a given spatial radius of each other. The following was added to the main text (line 149):

      “As an additional verification step, SpikeMAP allows the computation of spike-count correlations between putative neurons located within a user-defined radius. Signals that exceed a defined threshold of correlation can be rejected as they likely reflect the same underlying cell.”

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      We have added a routine to SpikeMAP that computes population spike rates to verify stationarity over time. We have also added a routine to identify putative bursting neurons through a Hartigan statistical dip test applied to the inter-spike distribution of individual cells.

      We added the following (line 204):

      “Further, SpikeMAP contains a routine to perform a Hartigan statistical dip test on the inter-spike distribution of individual cells to detect putative bursting neurons.”

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We have added the following (line 326):

      “future work could include different inhibitory interneurons such as somatostatin (SOM) and vasoactive intestinal polypeptide (VIP) neurons to improve the classification of inhibitory cell types. Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #2 (Public review)

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      Thank you for your insightful comment. A full comparison between SpikeMAP and related methods is provided in Table. 1. As can be seen, SpikeMAP is the only method listed that performs E/I sorting on large-scale multielectrodes. Nonetheless, several aspects of SpikeMAP included in the spike sorting pipeline do overlap with existing methods, as these constitute necessary steps prior to performing E/I identification. These steps are not novel to the current work, nor do they constitute rigid options that cannot be substituted by the user. Rather, we aim to offer SpikeMAP users the option to combine E/I identification with preliminary steps performed either through our software or through another package of their choosing. For instance, preliminary spike sorting could be done through Kilosort before importing the spike data into SpikeMAP for E/I identification. To allow greater flexibility, we have now modularized our suite so that E/I identification can be performed as a stand-alone module. We have clarified the text accordingly (line 317):

      “While SpikeMAP is the only known method to enable the identification of putative excitatory and inhibitory neurons on high-density multielectrode arrays (Table 1), several aspects of SpikeMAP included in the spike sorting pipeline (Figure 1) overlap with existing methods, as these constitute required steps prior to performing E/I identification. To enable users the ability to integrate SpikeMAP with existing toolboxes, we provide a modularized suite of protocols so that E/I identification can be performed separately from preliminary spike sorting steps. In this way, a user could carry out spike sorting through Kilosort or another package before importing their data to SpikeMAP for E/I identification.”

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      The paper by Hilgen et al. is reported in Table 1. As seen, while this paper employs optogenetics, it does not target inhibitory (e.g., PV) cells. We have added the following clarification (line 82):

      “Despite evidence showing differences in action potential kinetics for distinct cell-types as well as the use of optogenetics (Hilgen et al., 2017), there exists no large-scale validation efforts, to our knowledge, showing that extracellular waveforms can be used to reliably distinguish cell-types.”

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      We thank the reviewer for this comment, and have amended the title as follows:

      “SpikeMAP: An unsupervised pipeline for the identification of cortical excitatory and inhibitory neurons in high-density multielectrode arrays with ground-truth validation”

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution,n might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer that the center-of-mass algorithm carries limitations that are addressed by other methods. To address this issue, we have included two additional protocols in SpikeMAP to perform monopolar triangulation and grid-based convolution, offering additional options for users of the package. The text has been clarified as follows (line 429):

      “In addition to center-of-mass triangulation, SpikeMAP includes protocols to perform monopolar triangulation and grid-based convolution, offering additional options to estimate putative soma locations based on waveform amplitudes.”

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We clarified the text as follows (line 183):

      “While we found that a resolution of 90 kHZ provided a reasonable estimate of spike waveforms, this value can be adjusted as a parameter in SpikeMAP.”

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      We agree with the reviewer that it would be useful to have the option of performing PCA on several channels at once, since spikes can occur at several channels at the same time. We have now added a routine to SpikeMAP that allows users to define a radius around individual channels prior to performing PCA. The text was clarified as follows (line 131):

      “The SpikeMAP suite also offers a routine to select a radius around individual channels in order to enter groups of adjacent channels in PCA.”

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one can not pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one can not find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      We clarified the text as follows (line 135):

      “In SpikeMAP, the optimal number of k-means clusters can be chosen by a Calinski-Harabasz criterion (Calinski and Harabasz, 1974) or pre-selected by the user.”

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We added Supplemental Figure 1 showing the drop in voltage over all putative somas (N=1,950) of one recording, after excluding somas with an increase voltage away from electrode peak and computing normed values V/max(V). We see a distribution of slopes as well as intercepts across somas, showing some variability across recordings sites. As the reviewer suggests, it is possible that a power-law describes these data better than a linear function, and this would need to be investigated further by quantitatively comparing the fit of these functions.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      The reviewer is correct to point out that a number of stringent criteria were employed to exclude some putative cells. We now outline these criteria directly in the text (line 161):

      “ At different steps in the process, conditions for rejecting spikes can be tailored by applying: (1) a stringent threshold to filtered voltages; (2) a minimal cut-off on the signal-to-noise ratio of voltages (see Supplemental Figure 2); (3) an LDA for cluster separability; (4) a minimal spike rate to putative neurons; (5) a Hartigan statistical dip test to detect spike bursting; (6) a decrease in voltage away from putative somas; and (7) a maximum spike-count correlation for nearby channels. Together, these criteria allow SpikeMAP users the ability to precisely control parameters relevant to automated spike sorting.”

      Further, we provide SNRs of individual channels (Supplemental Figure 2), and added to the SpikeMAP software the ability to apply a minimal criterion based on SNR.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.

      We have added figures showing the distribution of E and I firing rates across a population of N=1,950 putative cells (Supplemental Figure 3). Firing rates of inhibitory neurons are marginally higher than excitatory neurons, and both E and I follow an approximately exponential distribution of rates.

      Reviewer may be right that there are more I neurons at borders in Fig.3B because injections were done in medial prefrontal cortex, so this may reflect an experimental artefact related to a high probability of activating I neurons in locations where the opsin was activated. We added a sentence to the text to clarify this point (line 201):

      “It is possible that the spatial location of putative I cells reflects the site of injection of the opsin in medial prefrontal cortex.”

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      The reviewer is correct to point out that our the spike-sorting portion of our pipeline shares similarities with related approaches. Other aspects, however, are unique to SpikeMAP. We have clarified the text accordingly:

      “In sum, SpikeMAP provides an end-to-end pipeline to perform spike-sorting on high-density multielectrode arrays. Some elements of this pipeline are similar to related approaches (Table 1), including the use of voltage filtering, PCA, and k-means clustering. Other elements are novel, including the use of spline interpolation, LDA, and the ability to identify putative excitatory and inhibitory cells.”

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      Again, we apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mices were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      Details of the open access data are now provided in Supplemental Table 1. We also clarified Figure 5B:

      “Quantification of change in firing rate following optogenetic stimulation. Average firing rates are taken over four recordings obtained from 3 mice.”

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We agree with the reviewer that it would be worthwhile for future work to apply SpikeMAP to artificially generated spike trains, and have added the following (line 328):

      “Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #1 (Recommendations for the authors):

      (1) Line 154 seems to include a parenthetical expression left over from editing: "sensitive to noise (contamination? Better than noise?) generated by the signal of proximal units." See also line 186: "use (reliance?) of light-sensitive" and line 245: "In the absence of synaptic blockers (right?)," and line 270: "the size of the data prevents manual intervention (curation?)." Check carefully for all parentheses like that, which should be removed.

      Thank you for pointing this out. We have revised the text and removed parenthetical expressions left over from editing.

      (2) In lines 285-286, you state that: "k-mean clustering of spike waveform properties best differentiated the two principal classes of cells..." But I could not find where you compared k-means clustering to other methods. I think you just argued that k-means seemed to work well, but not better than, another method. If that is so, then you should probably rephrase those lines.

      The reviewer is correct that direct comparisons are not performed here, hence we removed this sentence.

      (3) Methods section, E/I classification, lines 396-405: You give us figures on what fraction was E and I (PV subtype) (94.75% and 5.25%), but there is more that you could have said. First of all, what is the expected fraction of parvalbumin-sensitive interneurons in the cortex - is it near 5%?

      We clarified the text as follows (line 444): “This number is close to the expected percentage of PV interneurons in cortex (4-6%) (Markram et al. 2004).”

      Second, how would these percentages change if you altered the threshold from 3 s.d. to something lower, like 2 s.d.? Giving us some idea of how the threshold affects the fraction of PV interneurons could give us an idea of whether this method agrees with our expectations or not.

      While SpikeMAP offers the flexibility to set the voltage threshold manually, we opted for a stringent threshold to demonstrate the capabilities of the software. As seen in Figure 2D, at 2 and 3 s.d., the signal is largely accounted for by Gaussian noise, while deviation from noise arises around 4 s.d. We clarified the text as follows (line 120):

      “At a threshold of -3 , the signal could be largely accounted for by Gaussian noise, while a separation between signal and noise began around a threshold of -4 ”

      Third, did the inhibitory neurons identified by this optogenetic method also have narrow spike widths at half amplitude? Could you do a scatterplot of all the spike widths and inter-peak distances that had color-coded dots for E and I based on your optogenetic method?

      We have added a scatterplot (Supplemental Figure 5).

      (4) Can you compare your methods with others now widely in use, like, for example, Spiking Circus or Kilosort? You do that in Table 1 in terms of features, but not in terms of performance. For example, you could have applied Kilosort4 to your data from the 4096 electrode array and seen how often it sorted the same neurons that SpikeMAP did. I realize this could not give you a comparison of how many were E/I, but it could tell you how close your numbers of neurons agreed with their numbers. Were your numbers within 5% of each other? This would be helpful for groups who are already using Kilosort4.

      As mentioned ealier, packages listed in Table 1 do not provide an identification of putative E/I neurons on high-density electrode arrays. To facilitation the integration of SpikeMAP with other spike sorting packages, our suite now provides a stand-alone module to perform E/I identification. This is now mentioned in the text (see earlier comment).

      Reviewer #2 (Recommendations for the authors):

      I would encourage the authors to decide what the paper is about: is it about a new sorting method (and if yes, more tests/benchmarks are needed to explain the pros and the cons of the pipelines, and the Methods need to be expanded). Or is it about the new data for Ground Truth validation, and again, if yes, then maybe explain more what they are, how many slices/mice/cells, ... Maybe also consider making the data available online as an open dataset.

      We agree with the reviewer that the paper is best slated toward ground truth validation of E/I identification. We now specify how many slices/mice/cells etc. (see Supplemental Table 1) and make the data available online as open source.

    1. eLife Assessment

      This is a valuable computational study of odor responses in the early olfactory system of insects and vertebrates. The study addresses the question of how information about odor concentration is encoded by second-order neurons in the invertebrate and vertebrate olfactory system; it offers insights into the transformation of neural signals from receptors to second-order neurons. While reanalysis of published data presents solid evidence supporting compression of concentration information, incomplete analysis is provided to resolve how this observation could be reconciled with the need to preserve information about changes in stimulus intensity. This work will be of interest to neuroscientists studying sensory processing broadly and olfaction specifically.

    2. Reviewer #1 (Public review):

      Summary

      This article is about the neural representation of odors in the early olfactory system of insects, fish, and rodents. Specifically, it regards the transformation that occurs between the olfactory sensory cells and the second-order neurons (projection neurons in insects, mitral/tufted cells in vertebrates). The central question is how the nervous system can encode both the identity of an odor and its concentration over many log units. The authors reanalyze data from experimental studies of odor responses in primary and secondary neurons, and test a range of computational models as to whether they match the observed transformation. They focus on two aspects of the second-order neuron response to odor concentration: the average activity across all neurons varies only a little with odor concentration, and different neurons have concentration-response curves with different shapes. They conclude that a model of divisive normalization can account for these effects, whereas two alternative models fail the test. A second observation is that tufted cells in the rodent system seem to undergo less normalization than mitral cells, and some reasons for this difference are proposed.

      Strengths:

      (1) The work compares different models for normalization, rather than simply reporting success with one.

      (2) The analysis is applied to very diverse species, potentially revealing a common principle of olfactory processing.

      Weaknesses:

      (1) It is unclear that animals actually have a need to represent odor concentration over many log units in support of olfactory behaviors.

      (2) The stimuli used in the chosen experiments, and the measure of neural response, are only weakly related to any ecological need, e.g., during odor tracking.

      (3) Some of the comparisons between receptors and second-order neurons also compare across evolutionarily distant insect species that may not use the same coding principles.

      (4) The analysis ignores the dynamics of odor responses, which figure prominently in previous answers to the question of identity/intensity coding.

      (5) There is considerable prior consensus in the literature on the importance of normalization from primary to secondary neurons.

      Elaboration of my comments:

      (1) Motivation

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      (2) Conceptual

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      (3) Methods

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      (4) Models of normalization

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      (5) Tufted cells

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

    3. Reviewer #2 (Public review):

      Summary:

      The main goal of this study is to examine how information about odor concentration is encoded by second-order neurons in the invertebrate and vertebrate olfactory system. In many animal models, the overall mean firing rates across the second-order neurons appear to be relatively flat or near constant with increasing odor intensity. While such compression of concentration information could aid in achieving concentration invariant recognition of odor identity, how this observation could be reconciled with the need to preserve information about the changes in stimulus intensity is a major focus of the study. The authors show that second-order neurons have 'diverse' dose-response curves and that the combinations of neurons activated (particularly the rank-order) differ with concentration. Further, they argue that a single circuit-level computation, termed 'divisive normalization,' where the individual neural response is normalized by the total activity across all neurons, could help explain the coding properties of neurons at this stage of processing in all model organisms examined. They present approaches to read out the concentration information using spike rates or timing-based approaches. Finally, the authors reveal that tufted cells in the mouse olfactory bulb provide an exception to this coding approach and encode concentration information with a monotonic increase in firing rates.

      Strengths:

      (1) Comparative analysis of odor intensity coding across four different species, revealing the common features in encoding stimulus-driven features, is highly valuable.

      (2) Showing how mitral and tufted cells differ in encoding odor intensity is potentially very important to the field.

      (3) How to preserve concentration information while compressing the same with divisive normalization is also a novel and important problem in the field of sensory coding.

      Weaknesses:

      (1) The encoding problem:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. The authors acknowledge this as part of their analysis in Figure 3.

      "Therefore, divisive normalization mostly does not alter the relative contribution (rank order) of each neuron in the ensemble." (Page 4, last paragraph, lines 6-8).

      The analysis in this figure indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code.

      There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration?

      Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified.

      Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      (2) The decoding problem.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?<br /> Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      (3) Analysis of existing data.

      I had a couple of issues related to the presentation and analysis of prior results.

      i) Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      ii) A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      iii) I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      (4) Simulated vs. Actual data.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, Shen et al. examine how first- and second-order neurons of early olfactory circuits among invertebrates and vertebrates alike respond to and encode odor identity and concentration. Previously published electrophysiological and imaging data are re-analyzed and complemented with computational simulations. The authors explore multiple potential circuit computations by which odor concentration-dependent increases in first-order neuron responses transform into concentration-invariant responses on average across the second-order neuron population, and report that divisive normalization exceeds subtractive normalization and intraglomerular gain control in accounting for this transformation. The authors then explore how either rate- or timing-based schemes in third-order neurons may decode odor identity and concentration information from such concentration-invariant mean responses across the second-order neuron population. Finally, the results of their study of second-order neurons (invertebrate projection neurons and vertebrate mitral cells) are contrasted with the concentration-variant responses of second-order projection tufted cells in mammals. Overall, through a combination of neural data re-analysis, computational simulation, and conceptual theory, this study provides important new understanding of how aspects of sensory information are encoded through the actions of distinct components of early olfactory circuits.

      Strengths:

      Consideration of multiple evolutionarily disparate olfactory circuits, as well as re-analysis of previously published neural data sets combined with novel simulations guided by those sets, lends considerable robustness to some key findings of this study. In particular, the finding that divisive normalization - with direct inspiration from established circuit components in the form of glomerular layer short-axon cells - accounts more thoroughly for the average concentration invariance of second-order olfactory neurons at a population level than other forms of normalization is compelling. Likewise, demonstration of the required 'crossover' of first-order neuron concentration sensitivity for divisive normalization to achieve such flattening of concentration variance across the second-order population is notable, with simulations providing important insight into experimentally observed patterns of first-order neuron responses. Limited clarity in other aspects of the study, in particular related to the consideration of neural response latencies and enumerated below, temper the overall strength of the study.

      Weaknesses:

      (1) While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      (2) Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      (3) The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      (4) It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      (5) How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      (6) In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      (7) Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      (8) Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

    5. Author response:

      (1) Explore the temporal component of neural responses (instead of collapsing responses to a single number, i.e., the average response over 4s), and determine which of the three models can recapitulate the observed dynamics.

      (2) Expand the polar plot visualization to show all three slopes (changes in responses across all three successive concentrations) instead of only two slopes.

      (3) Attempt to collect and analyze, from published papers, data of: (a) first-order neuron responses to odors to determine the role of first-order inhibition towards generating non-monotonic responses, and (b) PN responses in Drosophila to properly compare with corresponding first-order neuron responses.

      (4) Further discuss: (a) why the brain may need to encode absolute concentration, (b) the distinction between non-monotonic responses and cross-over responses, and (c) potential limitations of the primacy model.

      (5) Expand the divisive normalization model by evaluating different values of k and R, and study the effects of divisive normalization on tufted cells.

      (6) Add discussion of other potential inhibitory mechanisms that could contribute towards the observed effects.

      Reviewer #1:

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      We thank the Reviewer for the insightful input and agree that gradients across time and space are important for various olfactory behaviors, such as tracking. At the same time, we think that absolute concentration is also needed for two reasons. First, in order to extract changes in concentration, the absolute concentration needs to be normalized out; i.e., change needs to be encoded with respect to some baseline, which is what divisive normalization computes. Second, while it is true that representing the exact number of odor molecules present is not important, this number directly relates to distance from the odor source, which does provide ethological value (e.g., is the tiger 100m or 1000m away?). Indeed, our decoding experiments focused on discriminating relative, and not on absolute, concentrations by classifying between each pair of concentrations (i.e., relative distances), which is effectively an assessment of the gradient. In our revision, we will make all of these points clearer.

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      To our knowledge, it is not well understood how downstream brain regions read out mitral cell responses to guide olfactory behavior. The olfactory bulb projects to more than a dozen brain regions, and different regions could decode signals in different ways. We focused on the mean response because it is a simple, natural construct.

      The datasets we analyzed may not include all relevant timing information; for example, the mouse data is from calcium imaging studies that did not track sniff timing. Nonetheless, we plan to address this comment within our framework by binning time into smaller-sized windows (e.g., 0-0.2s, 0.2-0.4s, etc.) and repeating our analysis for each of these windows. Specifically, we will determine how each normalization method fares in recapitulating statistics of the population responses of each window, beyond simply assessing the population mean.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      We agree that mean activity is only one measure to summarize a rich data set and will perform the suggested analysis.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      Primacy coding does provide one plausible mechanism to decode concentration. Our manuscript demonstrated how such a code could emerge in second-order neurons with the help of divisive normalization, though it does require maintaining at least partial rank invariance across concentrations, which may not be robust. We also showed how concentration could be decoded via spike rates, even if average rates are constant, which provides an alternative hypothesis to that of ref 23.

      Further, ref 23 only considers the piriform cortex, which, as mentioned above, is one of many targets of the olfactory bulb, and it remains unclear what the decoding mechanisms are of each of these targets. In addition, work from the same authors of ref 23 found multiple potential decoding strategies in the piriform cortex itself, including changes in firing rate (see Fig. 2E of ref. 23 - Bolding & Franks, 2017; as well as Fig. 4 in Roland et al., 2017).

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      We will add this explanation to the manuscript.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      We agree and will add this information to the manuscript.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      We agree that the two changes in the reviewer’s example will be categorized in the same quadrant in our analysis. We did not focus on the absolute changes because our analysis covers many log ratios of concentrations. Instead, we focused on the relative shapes of the concentration response curves, and more specifically, the direction of the change (i.e., the sign of the slope). We will better motivate this style of analysis in the revision. Moreover, in response to comments by Reviewer 2, we will compare response shapes between all three successive levels of concentration changes, as opposed to only two levels.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      We are in the process of requesting PN response data in Drosophila from groups that have collected such data and will repeat the analysis once we get access to the data.

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      While we agree that these manuscripts do study the effects of divisive normalization in insects and fish, here we show that this computation also generalizes to rodents. In addition, these previous studies do not focus on divisive normalization’s role towards concentration encoding/decoding, which is our focus. We will clarify this difference in the revision.

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      We apologize for the mistake in the subtractive normalization equation and will correct it. Thank you for catching it.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      Our intent was not to place all the methods on the “same footing” but rather to isolate the two primary components of normalization methods – non-linearity and lateral inhibition – and determine which of these, and in which combination, could generate the desired effects. Divisive normalization incorporates both components, whereas intraglomerular gain control and subtractive normalization only incorporate one of these components. We will clarify this reasoning in the revision.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      We thank the Reviewer for the input. Instead of fixing k for all second-order neurons, we will apply different k values for different neurons. We will also systematically vary the percentage of neurons used for the divisive normalization calculation in the denominator, and determine the regime under which the effects experimentally observed are reproducible. This approach takes into account the scenario that inter-glomerular inhibitory interactions are sparse.

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

      The question of what mitral cells are “good for”, compared to tufted cells, remains unclear in our view. We speculate that mitral cells provide superior context-dependent processing and are better for determining stimuli-reward contingencies, but this remains far from settled experimentally.

      We believe the mitral cell pathway evolved earlier than tufted cells, since the former appear akin to projection neurons in insects. Nonetheless, we agree that differences in energy consumption are unlikely to be the primary distinguishing factor, and in the revision, we will drop this argument.

      Reviewer #2:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. … The analysis in [Figure 3] indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code. There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration? Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified. Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      It appears that there is confusion about the definitions of “non-monotonicity” and “crossovers”.  These are two independent concepts – one does not necessarily lead to the other. Non-monotonicity concerns the response of a single neuron to different concentration levels. A neuron’s response is considered non-monotonic if its response goes up then down, or down then up, across increasing concentrations. A “cross-over” is defined based on the responses of multiple neurons. A cross-over occurs when the response of one neuron is lower than another neuron at one concentration, but higher than the other at a different concentration. For example, the responses of both neurons could increase monotonically with increasing concentration, but one neuron might start lower and grow faster, hence creating a cross-over. We will clarify this in the manuscript, which we believe will resolve the questions raised above.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?

      Yes, we used a simple classification scheme, logistic regression with a linear kernel, which is essentially a Euclidean distance-based classification. This scheme works better for tufted cells because they are more monotonic; i.e., if neuron A and B both increase their responsiveness with concentration, then Euclidean distance would be fine. But if neuron A’s response amplitude goes up and neuron B’s response goes down – as often happens for mitral cells – then Euclidean distance does not work as well. We will add intuition about this in the manuscript.

      Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      As suggested, we will compute the correlation coefficient of the similarity of neural responses for each odor (across trials). We will repeat this analysis for both mitral and tufted cells. To determine the effect of adaptation, we will compute correlation coefficients of responses between the 1st and 2nd trials vs the 1st and final trial.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      We agree that divisive normalization should not alter the rank order, but the rank order may change in first-order neurons, which carries through to second-order neurons. This confusion may be related to the one mentioned above re: cross-overs vs non-monotonicity. Moreover, in the simulated data (Fig. 4D-H), the Jaccard similarity was calculated based on only the 50 neurons with the highest affinity, not the entire population of neurons. As shown in Fig. 4H, most of the rank-order change happens in the remaining 150 neurons.

      Note that in response to a comment by Reviewer 3, we will change the presentation of Fig. 4H in the revision.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      In the Discussion, we wrote about how downstream circuits will need to learn which set of neurons are to be associated with each distinct concentration level. We will expand upon this point and include experimentally testable predictions.

      Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      It appears there is some confusion here; we will clarify in the text and figure captions that we did not average across different odors in our analysis. We will also add figure panels showing some representative neural responses as suggested by the Reviewer.

      A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      Yes, if a neuron responds to at least one concentration level in at least 50% of the trials, it is considered responsive. So it is possible that some neurons respond to one concentration level and otherwise flatline near zero.  We will highlight a few example neurons to visualize this scenario.

      I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      Your 2 cents are valuable! Thank you for raising this point. Instead of computing two slopes (C1-C3 and C2-C4), we will expand our analysis to include all three slopes (C1-C2, C2-C3, C3-C4). Consequently, there are 2^3 = 8 different response shapes, and we will list them and quantify the fraction of the responses that fall into each shape category.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

      We believe the Reviewer is referring to Figs. 4D and 4E, since Fig. 3D does not show a first-order neuron simulation, and there is no Fig 3E. In Fig. 4D there is no change of rank order because the simulation is for a single odor and single concentration level, and the change of rank-order (i.e., cross-overs) as we define occurs between concentration levels. We will clarify this in the manuscript.

      Reviewer #3:

      While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      We thank the Reviewer for the suggestion. We will request datasets of first-order neuron responses from the groups who acquired them. We will analyze this data to determine the role of inhibition or antagonistic binding and quantify what percentage of first-order neurons respond less strongly with larger concentrations.

      Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      We will perform the analysis suggested, specifically, we will set the negative mitral cell responses to 0 and assess whether the population mean remains flat.

      The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      We thank the reviewer for providing additional mechanisms to consider. As suggested, we will add discussion of these alternatives to divisive normalization.

      It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      As suggested by the Reviewer, we will add another simulation scenario where the response amplitudes (R) are different for different neurons. For each concentration, we will then average each neuron’s response across the entire response window and determine if the simulation reproduces the cross-overs as observed experimentally.

      How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      We apologize that Fig. 4H was a poor choice for visualization. What is plotted in Fig. 4H is the sorted identity of neurons under low and high concentrations, and points on the y=x line indicate that the two corresponding neurons have the same rank under the two concentrations. We will replace this panel with a more intuitive visualization, where the x and y axes are the ranks of the neurons; and deviation from the y=x line indicates how different the ranks are of a neuron to the two concentrations.

      In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      The original primacy model states that the latency of a neuron decreases with increasing concentration, while the ranks of neurons remain unaltered. Our results, on the other hand, suggest that the ranks do at least partially change across concentrations. This leads to two possible decoding mechanisms. First, if the top K responding neurons remain invariant across concentrations (even if their individual ranks change within the top K), then the brain could learn to associate a population of K neurons with a response latency; lower response latency means higher concentration. Second, if the top K responding neurons do not remain invariant across concentrations, then the brain would need to learn to associate a different set of neurons with each concentration level. The latter imposes additional constraints on the robustness of the primacy model and the corresponding read-out mechanism. We will include more discussion of these possibilities in the revision.

      Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      We agree that the word “generating” is faulty. We thank the reviewer for their more precise wording, which we will adopt.

      Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

      We agree that tufted cells are subject to divisive normalization as well, albeit probably to a less degree than mitral cells. To determine the effect of this, we will alter the strength (and degree of sparseness of interglomerular interactions) of divisive normalization and determine if there is a regime where response features of tufted cells match those observed experimentally.

    1. eLife Assessment

      This study reports important negative results by showing that genetic removal of the RNA-binding protein PTBP1 in astrocytes is not sufficient to induce their conversion into neurons, challenging prior claims in the field. It also provides a systematic and insightful analysis of the role of PTBP1 in regulating astrocyte-specific splicing. The evidence is convincing, as the experiments are technically robust, rigorously controlled, and supported by both imaging and transcriptomic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.<br /> To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. We will include this discussion in the revised manuscript to provide additional context for the apparent discrepancy between mRNA abundance and protein loss.

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype are needed for each biological replicate. Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs and NPCs using Western blotting. Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is  a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. We used Aldh1l1-CreERT2, which is designed to be active in all the astrocyte throughout mouse brain. Although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion. We will analyze the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. 

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we will perform separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examine whether their positional patterns differ. This will help clarify whether these exons are differentially regulated by PTBP1 through distinct motif positioning in mature astrocytes.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptomewide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we will incorporate publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors (PMID: 38956165). 

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. These will help us improve the manuscript. We will expand the Discussion. The contradictory results in the previously published studies can be due to the stringency and neuronal leakage of the astrocytespecific GFAP promoter that some investigators chose. Other possibilities include alternative cell origin, increased neuronal resilience, or combinations of as yet unidentified factors.

    1. Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

    2. Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors ( where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates.

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype ( differentiation, exhaustion, fitness...).

    3. Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification.

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses:

      No weaknesses were identified by this Reviewer.

    4. eLife Assessment

      The authors developed a fundamental computational method, which is intended to automatically process bioluminescence imaging-derived tumour images across anatomical regions and over time. This allows quantitative analysis of such data, and the authors applied it to describe the spatiotemporal distribution of tumour cells in response to CD19-targeted CAR-T cells that contained either CD28 or 4-1BB costimulatory domains. Some operational limitations were identified, which relate to the pipeline's reliance on predefined regions of interest instead of aligning signal sites with anatomical information, scaling, and not taking animal pose into account. Overall, the authors provide compelling evidence for the functionality of their computational approach towards automated analysis of bioluminescence imaging data, while applying it to a current topic of wide interest in cell therapy research.

    1. eLife Assessment

      This fundamental work provides solid evidence that advances our understanding of the physical mechanisms underlying bacterial cell division by examining the role of membrane tension and FtsZ condensation in sequential stages of division. The effect of accDA overexpression on membrane tension was carefully characterized. To further enhance rigor, the authors could consider examining orthogonal perturbations to membrane tension, addressing membrane tension vs. fluidity, and addressing the ability of FtsZ to bend membranes in cells.

    2. Reviewer #1 (Public review):

      In this study, Ramirez-Diaz and coworkers address an important and lingering question in the bacterial cell division field, i.e., whether FtsZ polymers bend the cell membrane inwards, using an elegant and innovative approach. The key cell division protein FtsZ is a homolog of tubulin and forms curved polymers in the presence of GTP. It has long been hypothesized that this curvature provides the force to bend the cell membrane inwards, thereby triggering septal synthesis. Several in vitro studies have shown that purified FtsZ, when attached to the membrane, can indeed deform artificial membranes. However, other studies favor the view that only septal peptidoglycan synthesis drives cell division. Ramirez-Diaz has tried to address the membrane deformation theory in vivo by developing a mutant that synthesizes extra lipids. In this way, the membrane tension is lowered, which would facilitate cell division if deformation of the cell membrane by curved FtsZ polymers is a crucial step in cell division. Surprisingly, they showed that this mutant overcomes the cell division block in a sepF ezrA double mutant. In addition, they carefully characterize the membrane characteristics of the mutant and the effect on FtsZ ring formation. With this work, they have set up a very useful model system to study the role of the cell membrane in cell division, and also a new tool to better study the function of the cell division proteins EzrA and SepF. Overall, this is a very important study for the bacterial cell division field with interesting findings and ideas.

      Nevertheless, the authors jump to a conclusion that I cannot yet share. The main issue I have is that they focus on membrane tensions, yet what they seem to modulate is membrane fluidity. Both are clearly related but not the same. I think that it is important to extensively address this issue in the manuscript. They (also) use Laurdan generalized polarization as an indication of membrane tension (Figure 1F), but this method is primarily used in the literature to measure membrane fluidity. In addition, they explain the occurrence of strong local fluorescent membrane signals as the occurrence of double membranes (Figure S1D), whereas others have shown that such fluorescent hot spots can, in theory, also be formed by local accumulation of fluid lipids (PMID: 24603761). The reason why it is so important to distinguish fluidity from tension is that for the attachment of FtsZ polymers, the cell makes use of anchor proteins like FtsA that contain an amphipathic alpha helix, which inserts into the inner leaflet of the lipid bilayer. Importantly, this insertion only works when the fatty acids can be "pushed apart", and this is stimulated by unsaturated and short-chain fatty acids that make the membrane more fluid (PMID: 12676941). If a membrane is "more fluid", then it can more easily accommodate an amphipathic helix. Thus, the production of extra membrane material may increase the fluidity of the cell membrane, as the Laurdan GP measurements indicated, which can then facilitate the attachment of FtsA, including the attached FtsZ polymers, to the membrane. In other words, what the authors have observed may not be a stimulation of Z-ring formation due to lowering membrane tension, but rather because of stimulated binding of FtsZ polymers to the cell membrane. It might be that the attachment of late cell division to the Z-ring, which is all transmembrane proteins, is also facilitated in a more fluid lipid environment. The authors have not excluded the latter (by using a mutant depleted for one of the late cell division proteins).

      Finally, the authors performed EM studies to measure septa thickness, and surprisingly, they did not seem to observe deformed septa in a sepF-ezrA double mutant, when overexpressing accDA, while it has been shown before that the absence of SepF leads to strongly deformed septa. Since this finding nuances the mode of action of SepF polymers, it should be discussed.

      In conclusion, this is an important and interesting study, but it seems crucial for the interpretation of the findings to include a clear discussion on membrane fluidity and its consequences.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ramirez-Diaz and colleagues set out to examine key physical mechanisms of bacterial cell division, using the Gram-positive model Bacillus subtilis. Specifically, they investigate the hypothesis that condensation of polymers of the master regulator of division FtsZ can deform membranes to initiate division, but that this is limited by membrane tension. They test this by modulating both membrane tension and FtsZ condensation genetically. To modulate membrane tension, they overexpress accDA to increase the rate of phospholipid synthesis and increase the "hidden membrane reservoir", thereby decreasing membrane tension. To modulate FtsZ condensation, they deplete the bundling protein EzrA in a background lacking a second bundling protein, SepF. They confirm the effects of accDA overexpression on membrane tension using two different sensors before assessing the relationship between membrane tension, FtsZ condensation, and division. They demonstrate that cells with excess membrane (reduced membrane tension) can divide with reduced bundling protein abundance, suggesting that FtsZ condensation driven by ZBPs normally serves to overcome membrane tension to initiate division. In addition, they find an inverse relationship between membrane tension and FtsZ ring constriction rate, but no effect of membrane tension on FtsZ treadmilling. Estimation of physical parameters leads them to conclude that very small membrane fluctuations are sufficient to initiate division in unperturbed cells and that the membrane contributes only ~0.1% of the total surface tension strength, maintaining cell shape.

      Strengths:

      The highly quantitative approach of this work is a strength, as is the rigorous assessment of membrane tension with multiple sensors. The model proposed is largely consistent with existing data and provides a mechanism for further study and validation. The study tackles a major outstanding question in bacterial cell biology, and provides a potential mechanism for a key step in replication with broad implications in other organisms.

      Weaknesses:

      The authors only use one method (overexpression of accDA) to perturb membrane tension, which could influence division in unanticipated ways (e.g., metabolic adaptations and/or activation of signaling pathways). The proposed model for initiation of division posits that FtsZ condensation bends membranes, which is supported by in vitro evidence, but there is no in vivo evidence that FtsZ condensation can bend membranes in cells. It remains possible that the function of FtsZ condensation is to localize sufficient cell wall synthetic activity to build peptidoglycan that rectifies membrane fluctuations.

    1. eLife Assessment

      This important study presents the rational redesign and engineering of interleukin-7. The data from the integrated approach of using computational, biophysical, and cellular experiments are convincing, but this study can further benefit from more quantitative analyses and structural data. This paper is broadly relevant to those studying immunomodulation using biologics.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the use of computational tools to design a mimetic of the interleukin-7 (IL-7) cytokine with superior stability and receptor binding activity compared to the naturally occurring molecule. The authors focused their engineering efforts on the loop regions to preserve receptor interfaces while remediating structural irregularities that destabilize the protein. They demonstrated the enhanced thermostability, production yield, and bioactivity of the resulting molecule through biophysical and functional studies. Overall, the manuscript is well written, novel, and of high interest to the fields of molecular engineering, immunology, biophysics, and protein therapeutic design. The experimental methodologies used are convincing; however, the article would benefit from more quantitative comparisons of bioactivity through titrations.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the computational design and experimental validation of Neo-7, an engineered variant of interleukin-7 (IL-7) with improved folding efficiency, expression yield, and therapeutic activity. The authors employed a rational protein design approach using Rosetta loop remodeling to reconnect IL-7's functional helices through shorter, more efficient loops, resulting in a protein with superior stability and binding affinity compared to wild-type IL-7. The work demonstrates promising translational potential for cancer immunotherapy applications.

      Strengths:

      (1) The integration of Rosetta loop remodeling with AlphaFold validation represents an established computational pipeline for rational protein design. The iterative refinement process, using both single-sequence and multimer AlphaFold predictions, is methodologically sound.

      (2) The authors provide thorough characterization across multiple platforms (yeast display, bacterial expression, mammalian cell expression) and assays (binding kinetics, thermostability, bioactivity), strengthening the robustness of their findings.

      (3) The identification of the critical helix 1 kink stabilized by disulfide bonding and its recreation through G4C/L96C mutations demonstrates deep structural understanding and successful problem-solving.

      (4) The MC38 tumor model results show clear therapeutic advantages of Neo-7 variants, with compelling immune profiling data supporting CD8+ T cell-mediated anti-tumor mechanisms.

      (5) The transcriptomic profiling provides valuable mechanistic insights into T cell activation states and suggests reduced exhaustion markers, which are clinically relevant.

      Weaknesses:

      (1) While computational predictions are extensive, the manuscript lacks experimental structural validation of the designed Neo-7 variants. The term "Structural Validation" should not be used in the header.

      (2) The authors observe slower on/off-rates for Neo-7 variants compared to wild-type IL-7. Could the authors speculate about the potential biological impacts of the slow off-rate, especially focusing on downstream signaling pathways that might be differentially affected by the altered binding kinetics of Neo-7 variants?

      (3) While computational immunogenicity prediction is provided, these methods are very limited.

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexpression of LC neurons after chronic stress is compelling. However, to fully support the broad implications for LC degeneration and Alzheimer's disease, the study would benefit from stronger causal integration and validation in age-relevant models.

    2. Reviewer #1 (Public review):

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions.

      First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence.

      Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces degeneration of locus coeruleus (LC) neurons. The authors demonstrate that chronic stress leads to the internalization of α2A-adrenergic receptors (α2A-ARs) on LC neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, activation of asparagine endopeptidase (AEP), and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are largely well-supported by the data, but some aspects of image acquisition require further examination.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK-1, using a range of pharmacological agents, highlighting the innovative nature of the work. Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly about therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Comments on revisions:

      The authors have addressed all of the reviewers' comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive data set showing that repeated excitation or restraint stress internalises somatodendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting and immunohistochemistry. The final schematic is appealing and in principle, could explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      Use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative

      Well-executed electrophysiology

      translation relevance

      converges to a model that peers discussed (scientists can only discuss models - not data!)

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We have revised our manuscript so as to make it easy for readers to follow the logical flow in transitions between mechanistic components by adding the descriptions of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 in the revised manuscript.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca2+ drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Figure 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown (Figures 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figures S4). [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Figure 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by [Ca<sup>2+</sup>]<sub>i</sub> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Therefore, we have softened the implication of anxiety and memory impairment (page 13, lines 397-400 in the revised manuscript).

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup> -dependent mechanism.

      We can hardly agree with this comment. 

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2a</sup>-dependent downregulation of GIRK (Figure 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Figure 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Figure 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figures 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Figure S1F; compare with Figure 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD. 

      It has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We have added such discussion in the revised manuscript (page 12, lines 381-384).

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). We have added this explanation in the method section of the revised manuscript (page 15, lines 474-475). A delayed spiking pattern in response to depolarizing pulses (Figure S10 in the revised manuscript) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examined the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that a high [Ca<sup>2+</sup>]<sub>i</sub> condition may have to be sustained for some period of time to induce changes in MAO-A, AEP and Tau N368, and therefore 3-day RS may be insufficient to induce such changes. We have added this in the revised manuscript (page 17, lines 521-525).

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We have revised accordingly.

      Reviewer #3 (Public review):

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #1 (Recommendations for the authors):

      (1) Improve the clarity and organization of the manuscript, ensuring smoother transitions between concepts and mechanisms.

      Please see the response to the comment raised by Reviewer #1 as Weakness

      (2) Adjust any quantifying method for cytosolic NA levels under different conditions to support the link between receptor internalization and NA accumulation.

      If fluorescent indicator of cytosolic free NA is available, it would be possible to measure changes in cytosolic NA levels. However, at present, there appeared to be no fluorescence probe to label cytosolic NA. For example, NS521 labels both dopamine and norepinephrine inside neurosecretory vesicles (Hettie & Glass et al., 2014, Chemistry), and BPS3 fluorescence sensor labels NA around cell membrane by anchoring on the cell membrane (Mao et al., 2023, Nat Comm). Furthermore, the method reported in “A Genetically Encoded Fluorescent Sensor for Rapid and Specific In Vivo Detection of Norepinephrine” is limited to detect NA only when α2AR is expressed. In the present study, increases in cytosolic NA levels are caused by internalization of α2AR. Cytosolic NA measurements with GRAB NE photometry may not be applicable in the present study. However, we have discussed the availability of such fluorescent methods to directly prove the increase in cytosolic NA as a limitation of our study (page 14, lines 429-436 in the revised manuscript).

      (3) Include validation of the chronic stress model with physiological and behavioral measures (e.g., corticosterone levels and another behavioral test).

      Please see the response to the comment raised by Reviewer #1 as Weakness (4).

      (4) All supplemental figures should be explicitly explained in the Results section. Specifically, clarify and describe the details of Figure S1G-K, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 to ensure all supplementary data are fully integrated into the main text.

      We have more explicitly and clearly described the details of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 and fully integrated those explanations into the main text in the revised manuscript.

      (5) In Figure 3, the morphology of TH-positive cells differs between panels D and E. Additionally, TH is typically expressed in the cytosol, but in the provided images, it appears to be localized only to the membrane. Please clarify this discrepancy and provide a lower-magnification image to display a larger area, not one cell.

      In a confocal image, TH is not necessarily expressed homogenously in the cytosol, but is expressed in a ring-shaped pattern inside the plasma membrane, avoiding the cell nucleus and its surrounding Golgi apparatus and endoplasmic reticulum (ER) (Henrich et al., 2018, Acta Neuropathol Commun; see Fig. 4a and 6e), especially when the number of z-stack of confocal images is small. This is presumably because LC neurons are especially enriched with numerous Golgi apparatus and ER (Groves & Wilson, 1980, J Comp Neurol).

      In Figure S7, we showed a lower-magnification image of LC and its adjacent area (mesencephalic trigeminal nucleus). In the LC area, there are a variety of LC neurons, which include oval shaped neurons (open arrowhead; similar to Figure 3D) and also rhombus-like shaped neurons (open double arrowheads, similar to Figure 3E). A much lower-magnification image of LC neurons constituting LC nucleus was shown in Figure 5A.

      (6) In Figure 5, the difference in MAO-A expression is not clearly visible in the fluorescence images. Enzymatic assays for AEP and MAO-A should be included to demonstrate the increased activity better.

      In the current study, we did not elaborate to detect the changes in TH, MAO-A and AEP in terms of immunohistochemical method. Instead, we elaborated to detect such changes in terms of western blot method. The main conclusions in the current study were drawn primarily by electrophysiological techniques as we have expended much effort on electrophysiological experiments. Because the relative quantification of active AEP and Tau N368 proteins by western blotting analysis may accurately reflect changes in those enzyme activities, enzymatic assay may not be necessarily required but is helpful to better demonstrate AEP and MAO-A activity. We have described the necessity of enzymatic assay to better demonstrate the AEP and MAO-A activities (page 10, lines 314-315).

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study.

      It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      (2) Pharmacology and NE concentration

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects.

      It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      (3) Calcium dependence is not yet definitive

      The rundown is induced with a TEA-enhanced pulse protocol. Blocking L-type channels with nifedipine (or using Cd²⁺) during this protocol should show whether Ca<sup>2+</sup> entry is necessary. Without such a control, the Ca<sup>2+</sup> link remains inferential.

      The Ca<sup>2+</sup> link was precisely demonstrated by a series of voltage clamp experiment, in which Ca<sup>2+</sup> influx was orderly potentiated by increasing the number of positive voltage pulses (Figures S1 and S2). As the number of positive voltage pulses was increased, the rundown of GIRK-I was accelerated or enhanced more. The relationship between the number of spikes and the Ca<sup>2+</sup> influx detected as Ca<sup>2+</sup> transients was well documented in Ca2+ imaging experiments using fura-2 (Figure S3).

      The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). The same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), and the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I. Therefore, blockade of Ca<sup>2+</sup> currents by nifedipine may not be so beneficial.

      (4) Age mismatch and disease claims

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope.

      As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (page 14, lines 448-450).

      (5) Direct evidence for extracellular/cytosolic NE

      The proposed rise in reuptake NA is inferred from electrophysiology. Modern fluorescent sensors (GRAB NE, nLight) or fast scan voltammetry could quantify NE overflow and clearance during stress, directly testing the model.

      Please see the response to the comment made by Reviewer #1 as the Recommendations for the authors (2) as described above.

      (6) Quantitative histology

      Figure 5 presents attractive images but no numerical analysis. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots.

      We have moved the immunohistochemical results in Fig. 5 to the supplement as we believe the quantification of immunohistochemical staining is not necessarily correct.

    1. eLife Assessment

      This study examines a valuable question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. However, the sample size is relatively small, with analyses appearing incomplete to fully support the primary claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC) - a sensory processing area - but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride. The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.

      Strengths:

      (1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.

      (2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.

      (3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time.

      Weaknesses:

      The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age. Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, and most coverage limited to the left hemisphere-hindering within-subject comparisons and limiting insights into lateralization. The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories. Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing. Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A non-facial music block that could have served as a control was available but not analyzed. Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were downsampled and averaged in 500-ms windows. Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants attended to.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shift from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.

      Strengths:

      (1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.

      (2) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.

      (3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.

      (4) Feature-based analysis: The authors employ AI-based tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.

      Weaknesses:

      (1) The emotional stimuli included facial expressions embedded in speech or music, making it difficult to isolate neural responses to facial emotion per se from those related to speech content or music-induced emotion.

      (2) While the authors leveraged Hume AI to extract facial expression features from the video stimuli, they did not provide any validation of the tool's accuracy or reliability in the context of their dataset. It remains unclear how well the AI-derived emotion ratings align with human perception, particularly given the complexity and variability of naturalistic stimuli. Without such validation, it is difficult to assess the interpretability and robustness of the decoding results based on these features.

      (3) Only two children had relevant pSTC coverage, severely limiting the reliability and generalizability of results.

      (4) The rationale for focusing exclusively on high-frequency activity for decoding emotion representations is not provided, nor are results from other frequency bands explored.

      (5) The hypothesis of developmental emergence of top-down prefrontal modulation is not directly tested. No connectivity or co-activation analyses are reported, and the number of participants with simultaneous coverage of pSTC and DLPFC is not specified.

      (6) The "post-childhood" group spans ages 13-55, conflating adolescence, young adulthood, and middle age. Developmental conclusions would benefit from finer age stratification.

      (7) The so-called "complex emotions" (e.g., embarrassment, pride, guilt, interest) used in the study often require contextual information, such as speech or narrative cues, for accurate interpretation, and are not typically discernible from facial expressions alone. As such, the observed age-related increase in neural encoding of these emotions may reflect not solely the maturation of facial emotion perception, but rather the development of integrative processing that combines facial, linguistic, and contextual cues. This raises the possibility that the reported effects are driven in part by language comprehension or broader social-cognitive integration, rather than by changes in facial expression processing per se.

    1. eLife Assessment

      This work presents a useful investigation of functional and structural brain changes following navigation and verbal memory training. The analyses of whole-brain structural changes are incomplete and would benefit from a more comprehensive approach to support the study's main conclusion regarding the lack of a structural whole-brain plasticity effect. However, some analyses are exhaustive and compelling in demonstrating the presence of longitudinal behavioural effects, the presence of functional activation changes, and the lack of hippocampal volume changes.

    2. Joint Public Review:

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Weaknesses:

      (1) The rationale for the study and its relationship with previous literature is not fully clear from the paper. In particular, there is a very large literature that has already explored the longitudinal effects of different types of training on functional and structural neuroimaging. However, this literature is barely acknowledged in the Introduction, which focuses on cross-sectional studies. Studies like the one by Draganski et al. 2004 are cited but not discussed, and are clumped together with cross-sectional studies, which is confusing. As a reader, it is difficult to understand whether the study was meant to be confirmatory based on previous literature, or whether it fills a specific gap in the literature on longitudinal neuroimaging effects of training interventions.

      (2) The main claim regarding the lack of changes in brain structure seems only partially supported by the analyses provided. The limited whole-brain evidence from structural neuroimaging makes it difficult to confirm whether there is indeed no effect of training. Beyond hippocampal analyses, many whole-brain analyses of both volumetric and diffusion-weighted imaging metrics are only based on coarse ROIs (for example, 34 cortical parcellations for grey matter analyses). Although vertex-wise analyses in FreeSurfer are reported, it is unclear what metrics were examined (cortical thickness? area? volume?). Diffusion-weighted imaging seems to focus on whole-tract atlas ROIs, which can be less accurate/sensitive than tractography-defined ROIs or voxel-wise approaches.

      (3) Quality control of images is only mentioned for FA images in subject space. Given that most analyses are based on atlas ROIs, visual checks following registration are fundamental and should be described in further detail.

    1. eLife Assessment

      This important study fills a gap in our knowledge of the evolution of GPCRs in holozoans, as well as the phylogeny of associated signaling pathway components such as G proteins, GRKs, and RIC8 proteins. The evidence supporting the conclusions is compelling, with the analysis of extensive new genomic data from choanoflagellates and other non-animal holozoans. Overall, the study is thorough and well-executed. It will be a resource for researchers interested in both the comparative genomics of multicellularity and GPCR biology more broadly, especially given the importance of GPCRs as highly druggable targets.

    2. Reviewer #1 (Public review):

      Summary:<br /> The authors strived for an inventory of GPCRs and GPCR pathway component genes within the genomes of 23 choanoflagellates and other close relatives of metazoans.

      Strengths:<br /> The authors generated a solid phylogenetic overview of the GPCR superfamily in these species. Intriguingly, they discover novel GPCR families, novel assortments of domain combinations, novel insights into the evolution of those groups within the Opisthokonta clade. A particular focus is laid on adhesion GPCRs, for which the authors discover many hitherto unknown subfamilies based on Hidden Markov Models of the 7TM domain sequences, which were also reflected by combinations of extracellular domains of the homologs. In addition, the authors provide bioinformatic evidence that aGPCRs of choanoflagellates also contained a GAIN domain, which are self-cleavable thereby reflecting the most remarkable biochemical feat of aGPCRs.

      Weaknesses:<br /> The chosen classification scheme for aGPCRs may require reassessment and amendment by the authors in order to prevent confusion with previously issued classification attempts of this family.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors set out to characterise the GPCR family in choanoflagellates (and other unicellular holozoans). GPCRs are the most abundant gene family in many animal genomes, playing crucial roles in a wide range of physiological processes. Although they are known to evolve rapidly, GPCRs are an ancient feature of eukaryotic biology. Identifying conserved elements across the animal-protist boundary is therefore a valuable goal, and the increasing availability of genomes from non-animal holozoans provides new opportunities to explore evolutionary patterns that were previously obscured by limited taxon sampling. This study presents a comprehensive re-examination of GPCRs in choanoflagellates, uncovering examples of differential gene retention and revealing the dynamic nature of the GPCR repertoire in this group. As GPCRs are typically involved in environmental sensing, understanding how these systems evolved may shed light on how our unicellular ancestors adapted their signalling networks in the transition to complex multicellularity.

      Strengths:<br /> The paper combines a broad taxonomic scope with the use of both established and recently developed tools (e.g. Foldseek, AlphaFold), enabling a deep and systematic exploration of GPCR diversity. Each family is carefully described, and the manuscript also functions as an up-to-date review of GPCR classification and evolution. Although similar attempts of understanding GPCR evolution were done over the last decade, the authors build on this foundation by identifying new families and applying improved computational methods to better predict structure and function. Notably, the presence of Rhodopsin-like GPCRs in some choanoflagellates and ichthyosporeans is intriguing, even though they do not fall within known animal subfamilies. The computational framework presented here is broadly applicable, offering a blueprint for surveying GPCR diversity in other non-model eukaryotes (and even in animal lineages), potentially revealing novel families relevant to drug discovery or helping revise our understanding of GPCR evolution beyond model systems.

      Weaknesses:<br /> While the study contributes several interesting observations, it does not radically revise the evolutionary history of the GPCR family. However, in an era increasingly concerned with the reproducibility of scientific findings, this is arguably a strength rather than a weakness. It is encouraging to see that previously established patterns largely hold, and that with expanded sampling and improved methods, new insights can be gained-especially at the level of specific GPCR subfamilies. Then, no functional follow ups are provided in the model system Salpingoeca rosetta, but I am sure functional work on GPCRs in choanoflagellates is set to reveal very interesting molecular adaptations in the future.

      Comments on the latest version:

      The authors have done a good job answering my questions and suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The chosen classification scheme for aGPCRs may require reassessment and amendment by the authors in order to prevent confusion with previously issued classification attempts of this family. (…) Can the authors suggest another scheme (mind to avoid the subfamily IIX or the alternative ADGRA-G,L,V subfamily schemes of metazoan aGPCRs), and adapt their numbering throughout the text and all figures/supplementary figures/supplementary files?

      We appreciate the reviewer's comment and agree that a different nomenclature should be used for choanoflagellate aGPCRs to avoid possible confusion. We have now re-labeled the choanoflagellate aGPCR subfamilies, previously numbered from I to XIX, using alphabetical enumeration (from A to S). Changes have been made throughout the main text, in Figure 5, and in Supplementary Figures S6 and S7.

      line 10: The abbreviation 'GPCR-TKL/Ks' is not explained.

      Thank you for pointing this out. We have now revised the text to explain the abbreviation:

      “Adhesion GPCRs and a class of GPCRs fused to kinases (the GPCR-TKL/Ks) are the most abundant GPCRs in choanoflagellates.”

      line 30: "7TM domain is diagnostic for GPCRs": strange wording. Use an alternative expression.

      We changed the wording to: 

      “A conserved seven transmembrane (7TM) domain is a hallmark of GPCRs, while the wide spectrum of extracellular and intracellular domains in some GPCRs reflects the diversification of the gene family and its functions (Schiöth and Lagerström 2008).”

      line 33: In the case of rhodopsins, not the GPCR (i.e., the apoprotein) responds directly to photons, but the retinal, which isomerises upon illumination.

      We thank the reviewer for bringing this to our attention, and we have now removed mention of photons from the list of cues detected by GPCRs.

      “For example, the extracellular N-terminus and the three extracellular loops of the 7TM domain respond to a wide range of cues, including odorant molecules, peptides, amines, lipids, nucleotides, and other molecules (Yang et al. 2021).”

      line 111: What are "genome-enabled choanoflagellates"? Explain the term. As it stands, it doesn't make sense to me.

      We meant only to highlight that these two species have sequenced genomes. We have deleted the phrase “genome enabled.”

      “To assess the predictive power of our protein-detection pipeline, we then compared the new GPCR and cytosolic signaling component datasets from two choanoflagellates – Salpingoeca rosetta and Monosiga brevicollis – with previously published GPCR and downstream GPCR signaling component counts for these two species (Nordström et al. 2009a; Krishnan et al. 2012; De Mendoza et al. 2014; Krishnan et al. 2015; Lokits et al. 2018).”

      line 145: Please give a reasoning for the naming of each of the new families (e.g., RemiSens, Hidden Gold, GPCR-TLK/K, etc.) or at least the explanations of the acronyms/names early in the manuscript, even if they are discussed later in more detail.

      Thank you for identifying this as an area of confusion. While we feel that going into the rationale behind each of the names here would interrupt the flow of the manuscript, we have added a phrase encouraging readers to “hold that thought” with the hope that they can wait for the sections that specifically focus on each of these new GPCR families.

      “This left twelve new GPCR families that had not, to our knowledge, been previously detected in choanoflagellates: Rhodopsin, TMEM145, GPR180, TMEM87, GPR155, GPR157, and six additional GPCR families that appear to fall outside all previously characterized GPCR families in eukaryotes. For reasons that will be discussed further below, we have named these six new GPCR families “Rémi-Sans-Famille” (RSF), “Hidden Gold” (Hi-GOLD), GPCR-TKL/K, GPRch1, GPRch2, and GPRch3. (Fig. 1B; Table 1).”

      lines 297/298 and 2049: Rename tethered agonist "peptide" to "element". Synthetic peptides resembling the TA were used in experiments to test for the sufficiency of the TA for receptor activation, but because the naturally occurring TAs are part of the receptor protein, they are not peptides.

      Thank you for pointing this out. We have revised the text as suggested.

      line 2026: I think the letters in the acronym "CMR" are mixed up and were intended to read "CRM".

      Good catch! We have corrected the text.

      line 2048: "diagnostic" again. Change to "tell-tale", "hallmark", or another similar descriptor.

      We have corrected the text accordingly.

      2058: Strike "motif" in order to avoid confusion with the now obsolete term "GPS motif", which entailed the five most C-terminal β-strands of GAIN subdomain B (not thus neither the full GAIN domain nor the GPS).

      Thank you for pointing this out. We have corrected the text.

      Figure 5: Did the authors also find homologs placed in the aGPCR family based on their 7TM domain sequence but lacking a GAIN domain similar to vertebrate ADGRA/GPR123, the only aGPCR known to lack a GAIN domain (10.1016/j.tips.2013.06.002)? Irrespective of the authors' findings or non-finding on that matter, please insert a note on this in the results text.

      We thank the reviewer for bringing this interesting point to our attention. We have now added a new supplementary figure A in Fig. S9 to answer the reviewer's comment. We also modified the legend of Fig. S9  to take into account this change and uploaded a new supplementary data file 20 to support Fig. S9A. Finally, we revised the main text under the section “Adhesion GPCRs” as requested: 

      Lines 328-331: “ While the GAIN and aGPCR 7TM domains evolved before the origin of opisthokonts (Araç et al.2012; Krishnan et al. 2012; De Mendoza et al. 2014), we detected the fusion of these two domains into a single module (GAIN/7TM) in most, but not all, holozoan aGPCRs (Fig. 5D, Fig.S7B and S9A; Supplementary file 20; Prömel et al, 2013; Krishnan et al. 2014).

      Reviewer #2:

      While the study contributes several interesting observations, it does not radically revise the evolutionary history of the GPCR family. However, in an era increasingly concerned with the reproducibility of scientific findings, this is arguably a strength rather than a weakness. It is encouraging to see that previously established patterns largely hold, and that with expanded sampling and improved methods, new insights can be gained, especially at the level of specific GPCR subfamilies. Then, no functional follow-ups are provided in the model system Salpingoeca rosetta, but I am sure functional work on GPCRs in choanoflagellates is set to reveal very interesting molecular adaptations in the future.

      We agree with the reviewer and anticipate that this work will provide a useful resource to motivate the future functional characterization of GPCRs in choanoflagellates, other CRMs, as well as in metazoans.

      The GPCR-TKL fusion is a particularly interesting finding, especially given the presence of such sequences in sponges. This could potentially represent a synapomorphy shared between sponges and choanoflagellates, later lost in other animals. The authors mention that BLASTP searches using the kinase domain recover the sponge GPCR-TKLs, suggesting the fusion may be ancestral. It would be useful to include phylogenetic trees of both the GPCR and TKL domains to assess this possibility. The authors might also consider examining sponge genomes released by the DTOL project to increase representation from this group.

      We agree and thank the reviewer for this suggestion. We have now added the requested phylogenetic analyses to the new Figure S17, revised the supplementary files and Methods accordingly, and commented on these results in the main text under the section “GPCR-TKL/K and GPCR-TKs“.  

      Lines 579 – 589: “While no metazoan homologs were found when using the 7TM domain of choanoflagellate GPCR-TKs as queries, using the conserved tyrosine kinase domains as queries recovered GPCR-TKs in sponges but not in other metazoan lineages or other holozoans (Fig. S17E). To test whether GPCR-TKs in sponges and choanoflagellates are homologous, we performed phylogenetic analyses of their TK and 7TM domains (Fig. S17F and G). While the TK domains of GPCR-TKs from sponges and choanoflagellates formed a well-supported clade, their 7TM domains did not. These results point to a heterogeneous evolutionary history that may include domain swapping (i.e. ancestral GPCR-TKs in which the 7TM domain was replaced in either the sponge or choanoflagellate lineages) or convergent evolution, in which homologous 7TM domains fused with unrelated 7TM domains in the sponge and choanoflagellate lineages.”

      Added to the Method section “Sequence alignment and phylogenetic analyses”:

      Lines 913 – 933: “Phylogenetic analyses of holozoan aGPCRs, Glutamate Receptors, and Gα subunits, and the 7TM and Kinase domains from GPCR TK/TKL/Ks were performed in this study. (…) To construct the phylogenies of the Kinase domain and 7TM domain from the GPCR TK/TKL/Ks, we first built a dataset including all the GPCR TK/TKL/Ks sequences identified in choanoflagellates and in sponges, as well as the GPCR TKL/Ks previously published in oomycetes and amoebozoans (Van Den Hoogen et al. 2018). We extracted the 7TM domain and Kinase domain from each sequence by combining the transmembrane domain prediction tool TMHMM-2.0 and the protein domain prediction tool InterProScan with the alignment tool MAFFT (E-INS-I algorithm) on Geneious Prime v2024.07 (Supplementary Files 30 and 32). We then aligned the aGPCR, Glutamate and Glutamate GPCR TK/TKL/K Receptor 7TMs, the GPCR TK/TKL/Ks Kinase domain, or the full-length Gα sequences using MAFFT with the E-INS-I algorithm. The resulting alignments were then used for Maximum-likelihood and/or Bayesian inference of phylogenies (Fig. 3B, Fig. 5A, Fig. S3D, and Fig. S6A, and Fig. S17F and G; Supplementary Files 5, 9, 16,18, 31, and 33).”

      Rhodopsin-like receptors are proposed in the discussion to be potential cases of lateral gene transfer (LGT) between eukaryotes. To support or refute this hypothesis, it would be valuable to place the choanoflagellate and ichthyosporean Rhodopsins within a broader phylogeny of this family, including (a few) representatives from animals and other eukaryotes. Even if deep branching relationships remain unresolved, signs such as unusually short branches could point toward recent LGT events.

      Thank you for your suggestion. While we originally considered testing these alternative hypotheses in this manuscript by building a phylogeny, the rapid sequence evolution of the Rhodopsin family has stymied similar efforts in the past and instead motivated others to use clustering approaches like those used in our study (Hu et al. 2017; Thiel et al. 2023). Unfortunately, these types of analyses cannot be used to readily identify instances of LGT.

      Therefore, following the suggestion of the reviewer, we bit the bullet and performed phylogenetic analyses on the sequences in question. Unfortunately, these analyses were completely inconclusive, and we feel they do not warrant inclusion in the manuscript. The topologies of the sequence trees recovered were poorly supported and sensitive to most of the variables we tested – the set of rhodopsin sequences included, the multiple alignment algorithms used, and the probabilistic methods employed to infer the phylogenies. 

      Instead, we have revised the manuscript to highlight the challenge of differentiating between the different hypotheses that are consistent with the phylogenetic distribution of Rhodopsins:

      Lines 670 – 678: “Thus, while it is formally possible that Rhodopsins existed in stem choanoflagellates and were lost in most modern choanoflagellate lineages, either horizontal gene transfer or convergent evolution in the shared ancestor of S. macrocollata and S. punica are similarly plausible explanations for their presence in these species. Differentiating between these alternative evolutionary scenarios is challenging because of rapid rate of sequence evolution within the family and the resultant loss of phylogenetic signal. Our own preliminary investigations of Rhodopsin evolution in non-metazoans were inconclusive. Therefore, ambiguities about the provenance and function of CRM Rhodopsins currently obscure the ancestry of metazoan Rhodopsins and opsins.”

      While the study surveys most available holozoan genomes, it appears that the genomes of Amoebidium spp.-which are cited in the manuscript- were not included. It may not be necessary to repeat all analyses with these two species (A. appalachense and A. parasiticum), but a preliminary search indicates the presence of four candidate 7tm_1 (Rhodopsin-like) proteins in their proteomes. These may warrant closer inspection (e.g., via BLASTP against animal databases) to confirm whether they are genuine GPCRs or false positives.

      Author response image 1.

      We thank the reviewer for bringing these sequences to our attention. To be clear, we did not analyze the Amoebidium spp. genome and we can find no reference to it in our manuscript. If the reviewer had the impression that the genome was analyzed, we would be grateful to know the source of the confusion so that it can be corrected. (We did not intentionally exclude the genome; it simply was not available on the Multicell Genome database from which we retrieved the ichthyosporean genomes and transcriptomes used in this study.)

      Nevertheless, out of curiosity, we have now analyzed the sequences provided by the reviewer and summarize our findings here for the interest of the reviewer. Although the sequences were annotated as 7tm_1 (Rhodopsin-like) proteins in the original genome study, none of these sequences group with metazoan or choanoflagellate Rhodopsins in our clustering analysis; instead, we found that these putative GPCRs form a distinct cluster that only weakly resembles cAMP receptors, both on the basis of their sequence and predicted structures. 

      It is not surprising to find new GPCR clusters as new taxa are folded into the study, and these Amoebidium sequences do not add to our understanding of Rhodopsin evolution. Therefore, we have not added their analysis to the manuscript, but we hope the reviewer finds our quick analysis of interest.

      Author response image 2.

      In Figure 2, perhaps expanding the other holozoan clades would have been nice, as there are not too many species, but I understand if that's beyond the point of the manuscript, focused on choanoflagellates.

      Thank you for this comment. However, given the focus of this study, we feel that an expansion of the other holozoan clades would reduce the clarity of the figure.

      line 87 - "To this end, the 671 validated choanoflagellate GPCRs were sorted by sequence similarity, resulting in 18 clusters. "Some details in the results section would be nice, or at least clear references to where this is explained in more detail. How were the extra choanoflagellate GPCRs added if they failed to be identified with quite sensitive HMM profiles?

      We apologize for the possible confusion and thank the reviewer for the suggestion; we have now added specific references to the related sections from the material and methods for interested readers.

      We believe that the "extra choanoflagellate GPCRs" mentioned by the reviewer refer to the choanoflagellate GPCRs that failed to be detected when the choanoflagellate genomes and transcriptomes were searched with the predominantly metazoan-derived GPCRHMM and HMMs from the GPCR_A Pfam clan (CL0192). We were able to recover these extra choanoflagellate GPCRs by using custom choanoflagellate-specific GPCR HMMs and by blasting the choanoflagellate GPCRs previously identified as queries against the 23 choanoflagellate proteomes. We hope that the referencing of the Methods section "Recovering additional choanoflagellate GPCRs using choanoflagellate GPCR BLAST queries and custom choanoflagellate GPCR HMMs", in lines 91 and 93, will help clarify this point.

      line 108 - Well, from the figure it seems that most eukaryotes have an 'animal-like' G protein signalling, so that's perhaps more of an eukaryotic signature than something that puts choanoflagellates and animals together.

      Excellent point! We have revised the text.

      line 132 - It is unclear what the criteria are to include these taxa as helpers for choanoflagellate classification, and not adding the other unicellular holozoans. Just some text justification could help.

      Thank you for pointing this out. We have added an explanation of the rationale to the methods — section “Clustering of the 918 validated choanoflagellate GPCRs” — and referred to it in the main text.

      New text added to methods:

      “The non-choanoflagellate sequences added to the dataset were either top blast hits recovered after searching the entire Eukprot v3 dataset (993 species) with choanoflagellate GPCRs as queries, or previously published and well-documented GPCR sequences from metazoans.”

      line 145 - These families are listed, but perhaps it would be nice to explicitly mention that they will be covered in more detail later on in the manuscript. I found myself wondering about those exotic names, until I reached the sections in the manuscript where they are explained.

      Thank you for this suggestion. We have now modified our sentence to refer to the related sections.

      “For reasons that will be discussed further below, we have named these six new GPCR families “Rémi-Sans-Famille” (RSF), “Hidden Gold” (Hi-GOLD), GPCR-TKL/K, GPRch1, GPRch2, and GPRch3. (Fig. 1B; Table 1).”

      line 199 - perhaps would be nice to explain domain architecture of validated Dictyostelium GABA-like receptors (ANF domain?).

      Thank you for your suggestion. We have now modified the sentence to mention the protein domain composition of the validated GABA-like receptor, GrlE, in Dictyostelium.

      “The Glutamate Receptors from the amoebozan Dictyostelium discoideum, of which at least one, GrlE, binds both GABA and Glutamate presumably through its conserved ANF domain (Anjard and Loomis 2006; Taniura et al. 2006; Wu and Janetopoulos 2013), grouped separately from metazoan and CRM GPCRs in our analysis.”

      Figure S4 - Perhaps a stacked bar chart would be easier to browse than a bunch of pie charts, notoriously difficult to quantify.

      Thank you for this comment. Opinions differ on how best on whether pie charts or bar charts are more effective in this context (including between the authors of this manuscript). However, we think the point of Figure S4 a minor point, only to be appreciated by a tiny number of readers, and therefore have left the data presentation as it was in the original submission.

    1. eLife Assessment

      This valuable study tested the impact of DNA methylation on CTCF binding in two cancer cell lines. Increased CTCF binding sites are enriched in gene bodies, and associate with nuclear speckles, indicating a role in increased transcription. In the revised work, the inferred association with nuclear speckles has been supported with more solid data. These results will be of interest to the epigenetics field.

    2. Reviewer #2 (Public review):

      Summary:

      CTCF is one of the most well-characterized regulators of chromatin architecture in mammals. Given that CTCF is an essential protein, understanding how its binding is regulated is a very active area of research. It has been known for decades that CTCF is sensitive to 5-cystosine DNA methylation (5meC) in certain contexts. Moreover, at genomic imprints and in certain oncogenes, 5meC-mediated CTCF antagonism has very important gene regulatory implications. A number of labs (eg, Schubeler and Stamatoyannopoulos) have assessed the impact of DNA methylation on CTCF binding, but it is important to also interrogate the effect on chromatin organization (ie, looping). Here, Roseman and colleagues used a DNMT1 inhibitor in two established human cancer lines (HCT116 [colon] and K562 [leukemia]), and performed CTCF ChIPseq and HiChIP. They showed that "reactivated" CTCF sites-that is, bound in the absence of 5meC-are enriched in gene bodies, participate in many looping events, and intriguingly, appear associated with nuclear speckles. This last aspect suggests that these reactivated loops might play an important role in increased gene transcription. They showed a number of genes that are upregulated in the DNA hypomethylated state actually require CTCF binding, which is an important result.

      Strengths:

      Overall, I found the paper to be succinctly written and the data presented clearly. The relationship between CTCF binding in gene bodies and association with nuclear speckles is an interesting result. Another strong point of the paper was combining DNMT1 inhibition with CTCF degradation.

      Weaknesses:

      The most problematic aspect of the original version was the insufficient evidence for the association of "reactivated" CTCF binding sites with nuclear speckles. This has been more diligently assessed in the revised version.

      Comments on revisions:

      The authors have adequately addressed my points in this revised version.

    3. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the constructive comments, which have improved the manuscript. In response to these comments, we have made the following major changes to the main text and reviewer response:

      (1) Added experimental and computational evidence to support the use of Cut&Tag to determine speckle location.

      (2) Performed new Transmission Electron Microscopy (TEM) experiments to visualize interchromatin granule clusters +/- speckle degradation.

      (3) Altered the text of the manuscript to remove qualitative statements and clarify effect sizes.

      (4) Performed new analyses of published whole genome bisulfite data from LIMe-Hi-C following DNMT1 inhibition to demonstrate that CpG methylation is lost at DNMT1i-specific gained CTCF sites.

      (5) Included citations for relevant literature throughout the text.

      These revisions in addition to others are described in the point-by-point response below.

      Reviewer #1 (Public review):

      Summary

      Roseman et al. use a new inhibitor of the maintenance DNA methyltransferase DNMT1 to probe the role of methylation on binding of the CTCF protein, which is known to be involved chromatin loop formation. As previous reported, and as expected based on our knowledge that CTCF binding is methylation-sensitive, the authors find that loss of methylation leads to additional CTCF binding sites and increased loop formation. By comparing novel loops with the binding of the pre-mRNA splicing factor SON, which localizes to the nuclear speckle compartment, they propose that these reactivated loops localize to near speckles. This behavior is dependent on CTCF whereas degradation of two speckle proteins does not affect CTCF binding or loop formation. The authors propose a model in which DNA methylation controls the association of genome regions with speckles via CTCF-mediated insulation.

      Strengths

      The strengths of the study are 1) the use of a new, specific DNMT1 inhibitor and 2) the observation that genes whose expression is sensitive to DNMT1 inhibition and dependent on CTCF (cluster 2) show higher association with SON than genes which are sensitive to DNMT1 inhibition but are CTCF insensitive, is in line with the authors' general model.

      Weaknesses

      There are a number of significant weaknesses that as a whole undermine many of the key conclusions, including the overall mechanistic model of a direct regulatory role of DNA methylation on CTCF-mediated speckle association of chromatin loops.

      We appreciate the reviewer’s constructive comments and address them point-by-point below.

      (1) The authors frequently make quasi-quantitative statements but do not actually provide the quantitative data, which they actually all have in hand. To give a few examples: "reactivated CTCF sites were largely methylated (p. 4/5), "many CTCF binding motifs enriched..." (p.5), "a large subset of reactivated peaks..."(p.5), "increase in strength upon DNMT1 inhibition" (p.5); "a greater total number....." (p.7). These statements are all made based on actual numbers and the authors should mention the numbers in the text to give an impression of the extent of these changes (see below) and to clarify what the qualitative terms like "largely", "many", "large", and "increase" mean. This is an issue throughout the manuscript and not limited to the above examples.

      Related to this issue, many of the comparisons which the authors interpret to show differences in behavior seem quite minor. For example, visual inspection suggests that the difference in loop strength shown in figure 1E is something like from 0 to 0.1 for K562 cells and a little less for KCT116 cells. What is a positive control here to give a sense of whether these minor changes are relevant. Another example is on p. 7, where the authors claim that CTCF partners of reactivated peaks tend to engage in a "greater number" of looping partners, but inspection of Figure 2A shows a very minor difference from maybe 7 to 7.5 partners. While a Mann-Whitney test may call this difference significant and give a significant P value, likely due to high sample number, it is questionable that this is a biologically relevant difference.

      We have amended the text to include actual values, instead of just qualitative statements. We have also moderated our claims in the text to note where effect sizes are more modest.

      The following literature examples can serve as positive controls for the effect sizes that we might expect when perturbing CTCF. Our observed effect sizes are largely in line with these expected magnitudes.

      https://pmc.ncbi.nlm.nih.gov/articles/PMC8386078/ Fig. 2E

      https://www.cell.com/cell-reports/pdf/S2211-1247(23)01674-1.pdf Fig. 3J,K

      https://academic.oup.com/nar/article/52/18/10934/7740592 Fig. S5D (CTCF binding only).

      (2) The data to support the central claim of localization of reactivated loops to speckles is not overly convincing. The overlap with SON Cut&Tag (figure 2F) is partial at best and although it is better with the publicly available TSA-seq data, the latter is less sensitive than Cut&Tag and more difficult to interpret. It would be helpful to validate these data with FISH experiments to directly demonstrate and measure the association of loops with speckles (see below).

      A recent publication we co-authored validated the use of speckle (SON) Cut&Run using FISH (Yu et al, NSMB 2025, doi: 10.1038/s41594-024-01465-6). This paper also supports a role of CTCF in positioning DNA near speckles. Unfortunately, the resolution of these FISH probes is in the realm of hundreds of kilobases. This was not an issue for Yu et. al., as they were looking at large-scale effects of CTCF degradation on positioning near speckles. However, FISH does not provide the resolution we need to look at more localized changes over methylation-specific peak sites.

      Instead, we use Cut&Tag to look at these high-resolution changes. In Figure 3C, we show that SON localizes to DNMT1i-specific peaks only upon DNMT1 inhibition. We further demonstrate that this interaction is dependent on CTCF. In response to reviewer comments, we have now also performed spike-in normalized Cut&Tag upon acute (6 hr) SON degradation to validate that our signal is also directly dependent on SON and not merely due to a bias toward open chromatin.

      Author response image 1.

      TSA-seq has been validated with FISH (Chen et. al., doi: 10.1083/jcb.201807108), Alexander et. Al 10.1016/j.molcel.2021.03.006) Fig 6. We include TSA-seq data where possible in our manuscript to support our claims.

      We also note that Fig 2F shows all CTCF peaks and loops, not just methylation-sensitive peaks and loops, to give a sense of the data. We apologize for any confusion and have clarified this in the figure legend.

      (3) It is not clear that the authors have indeed disrupted speckles from cells by degrading SON and SRRM2. Speckles contain a large number of proteins and considering their phase separated nature stronger evidence for their complete removal is needed. Note that the data published in ref 58 suffers from the same caveat.

      Based upon the reviewers’ feedback, we generated Tranmission electron microscopy (TEM) data to visualize nuclear speckles +/- degradation of SON and SRRM2 (DMSO and dTAG). We were able to detect Interchromatin Granules Clusters (ICGs) that are representative of nuclear speckles in the DMSO condition. However, even at baseline, we observed a large degree of cell-to-cell variability in these structures. In addition, we also observe potential structural changes in the distribution of heterochromatin upon speckle degradation. Consequently, we hesitate to make quantitative conclusions regarding loss of these nuclear bodies. In the interest of transparency, we have included representative raw images from both conditions for the reviewers’ consideration.

      We also note that in Ref 58 (Ilik et. Al., https://doi.org/10.7554/eLife.60579), the authors show diffusion of speckle client proteins RBM25, SRRM1, and PNN upon SON and SRRM2 depletion, further supporting speckle dissociation in these conditions.

      Author response image 2.

      Author response image 3.

      (4) The authors ascribe a direct regulatory role to DNA methylation in controlling the association of some CTCF-mediated loops to speckles (p. 20). However, an active regulatory role of speckle association has not been demonstrated and the observed data are equally explainable by a more parsimonious model in which DNA methylation regulates gene expression via looping and that the association with speckles is merely an indirect bystander effect of the activated genes because we know that active genes are generally associated with speckles. The proposed mechanism of a regulatory role of DNA methylation in controlling speckle association is not convincingly demonstrated by the data. As a consequence, the title of the paper is also misleading.

      While it is difficult to completely rule out indirect effects, we do not believe that the relationship between methylation-sensitive CTCF sites and speckles relies only on gene activity.

      We can partially decouple SON Cut&Tag signal from gene activation if we break down Figure 4D to look only at methylation-sensitive CTCF peaks on genes whose expression is unchanged upon DNMT1 inhibition (using thresholds from manuscript, P-adj > 0.05 and/or |log2(fold-change)| < 0.5). This analysis shows that many methylation-sensitive CTCF peaks on genes with unchanged expression still change speckle association upon DNMT1 inhibition. This result refutes the necessity of transcriptional activation to recruit speckles to CTCF.

      Author response image 4.

      We note the comparator upregulated gene set here is small (~20 genes with our stringent threshold for methylation-sensitive CTCF after 1 day DNMT1i treatment).

      However, we acknowledge that these effects cannot be completely disentangled. We previously included the statement “other features enriched near speckles, such as open chromatin, high GC content, and active gene expression, could instead contribute to increased CTCF binding and looping near speckles” in the discussion. In response to the reviewer’s comment, we have further tempered our statements on page 20/21 and also added a statement noting that DNA demethylation and gene activation cannot be fully disentangled. While we are also open to a title change, we are unsure which part of the title is problematic. 

      (5) As a minor point, the authors imply on p. 15 that ablation of speckles leads to misregulation of genes by altering transcription. This is not shown as the authors only measure RNA abundance, which may be affected by depletion of constitutive splicing factors, but not transcription. The authors would need to show direct effects on transcription.

      We agree, and we have changed this wording to say RNA abundance.

      Reviewer #2 (Public review):

      Summary:

      CTCF is one of the most well-characterized regulators of chromatin architecture in mammals. Given that CTCF is an essential protein, understanding how its binding is regulated is a very active area of research. It has been known for decades that CTCF is sensitive to 5-cystosine DNA methylation (5meC) in certain contexts. Moreover, at genomic imprints and in certain oncogenes, 5meC-mediated CTCF antagonism has very important gene regulatory implications. A number of labs (eg, Schubeler and Stamatoyannopoulos) have assessed the impact of DNA methylation on CTCF binding, but it is important to also interrogate the effect on chromatin organization (ie, looping). Here, Roseman and colleagues used a DNMT1 inhibitor in two established human cancer lines (HCT116 [colon] and K562 [leukemia]), and performed CTCF ChIPseq and HiChIP. They showed that "reactivated" CTCF sites-that is, bound in the absence of 5meC-are enriched in gene bodies, participate in many looping events, and intriguingly, appear associated with nuclear speckles. This last aspect suggests that these reactivated loops might play an important role in increased gene transcription. They showed a number of genes that are upregulated in the DNA hypomethylated state actually require CTCF binding, which is an important result.

      Strengths:

      Overall, I found the paper to be succinctly written and the data presented clearly. The relationship between CTCF binding in gene bodies and association with nuclear speckles is an interesting result. Another strong point of the paper was combining DNMT1 inhibition with CTCF degradation.

      Weaknesses:

      The most problematic aspect of this paper in my view is the insufficient evidence for the association of "reactivated" CTCF binding sites with nuclear speckles needs to be more diligently demonstrated (see Major Comment). One unfortunate aspect was that this paper neglected to discuss findings from our recent paper, wherein we also performed CTCF HiChIP in a DNA methylation mutant (Monteagudo-Sanchez et al., 2024 PMID: 39180406). It is true, this is a relatively recent publication, although the BioRxiv version has been available since fall 2023. I do not wish to accuse the authors of actively disregarding our study, but I do insist that they refer to it in a revised version. Moreover, there are a number of differences between the studies such that I find them more complementary rather than overlapping. To wit, the species (mouse vs human), the cell type (pluripotent vs human cancer), the use of a CTCF degron, and the conclusions of the paper (we did not make a link with nuclear speckles). Furthermore, we used a constitutive DNMT knockout which is not viable in most cell types (HCT116 cells being an exception), and in the discussion mentioned the advantage of using degron technology:

      "With high-resolution techniques, such as HiChIP or Micro-C (119-121), a degron system can be coupled with an assessment of the cis-regulatory interactome (118). Such techniques could be adapted for DNA methylation degrons (eg, DNMT1) in differentiated cell types in order to gauge the impact of 5meC on the 3D genome."

      The authors here used a DNMT1 inhibitor, which for intents and purposes, is akin to a DNMT1 degron, thus I was happy to see a study employ such a technique. A comparison between the findings from the two studies would strengthen the current manuscript, in addition to being more ethically responsible.

      We thank the reviewer for the helpful comments, which we address in the point-by-point response below. We sincerely apologize for this oversight in our references. We have included references to your paper in our revised manuscript. It is exciting to see these complementary results! We now include discussion of this work to contextualize the importance of methylation-sensitive CTCF sites and motivate our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To address the above points, the authors should:

      (1) Provide quantitative information in the text on all comparisons and justify that the small differences observed, albeit statistically significant, are biologically relevant. Inclusion of positive controls to give an indication of what types of changes can be expected would be helpful.

      We have added quantitative information to the text, as discussed in the response to public comments above.  We also provide literature evidence of expected effect sizes in that response.

      (2) Provide FISH data to a) validate the analysis of comparing looping patterns with SON Cut&Tag data as an indicator of physical association of loops with speckles and b) demonstrate by FISH increased association of some of the CTCF-dependent loops/genes (cluster 2) with speckles upon DNMT1 inhibition.

      Please see response to Reviewer 1 comment #2 above. Unfortunately, FISH will not provide the resolution we need for point a). We have confidence in our use of TSA-seq and Cut&Tag to study SON association with CTCF sites on a genome-wide scale, which would not be possible with individual FISH probes. Specifically, since the submission of our manuscript several other researchers (Yu et al, Nat. Struct. and Mol. Biol. 2025, Gholamalamdari et al eLife 2025) have leveraged CUT&RUN/CUT&TAG and TSA-seq to map speckle associated chromatin and have validated these methods with orthogonal imaging based approaches.

      (3) Demonstrate loss of speckles upon SON or SRRM2 by probing for other speckle components and ideally analysis by electron microscopy which should show loss of interchromatin granules.  

      We have performed TEM in K562 cells +/- SON/SRRM2 degradation. Please see response to Reviewer 1 comment #3. Specifically, interchromatin granule clusters are visible in the TEM images of the DMSO sample (see highlighted example above), however, given the heterogeneity of these structures and potential global alterations in heterochromatin that may be occurring following speckle loss, we refrained from making quantitative conclusions from this data. We instead include the raw images above.

      (4) The authors should either perform experiments to clearly show whether loop association is transcription dependent or whether association is merely a consequence of gene activation. Alternatively, they should tone down their model ascribing a direct regulatory role of methylation in control of loop association with speckles and also discuss other models. Unless the model is more clearly demonstrated, the title of the paper should be changed to reflect the uncertainty of the central conclusion.

      Please see response to Reviewer 1 comment #4 above.

      (5) The authors should either probe directly for the effect of speckle ablation on transcription or change their wording.

      We have changed our wording to RNA abundance.

      Reviewer #2 (Recommendations for the authors):

      Major:

      ⁃ There was no DNA methylation analysis after inhibitor treatment. Ideally, genome bisulfite sequencing should be performed to show that the DNMT1i-specific CTCF binding sites are indeed unmethylated. But at the very least, a quantitative method should be employed to show the extent to which 5meC levels decrease in the presence of the DNMT1 inhibitor

      Response: We have now included analysis of genome wide bisulfite information from LIMe-Hi-C (bisulfite Hi-C) in K562 following DNMT1i inhibition. Specifically, we leverage the CpG methylation readout and find that DNTM1i-specific CTCF sites are more methylated than non-responsive CTCF peaks at baseline. In addition, these sites show the greatest decrease in CpG methylation upon 3 days of DNMT1 inhibition. We include a figure detailing these analyses in the supplement (Fig S1E). In addition, we have added CpG methylation genome browser tracks to (Fig S1D). In terms of global change, we have found that 3 days of DNMT1 inhibitor treatment leads to a reduction in methylation to about ~1/4 the level at baseline.

      I am not convinced that CUT&Tag is the proper technique to assess SON binding. CUT&Tag only works under stringent conditions (high salt), and can be a problematic assay for non-histone proteins, which bind less well to chromatin. In our experience, even strong binders such as CTCF exhibit a depleted binding profile when compared to ChIP seq data. I would need to be strongly convinced that the analysis presented in figures 2F-J and S2 D-I simply do not represent ATAC signal (ie, default Tn5 activity). For example, SON ChIP Seq, CUT&Tag in the SON degron and/or ATAC seq could be performed. What worries me is that increased chromatin accessibility would also be associated with increased looping, so they have generated artifactual results that are consistent with their model.

      As the reviewer suggested, we have now performed spike-in normalized SON Cut&Tag with DNMT1 inhibition and 6 hours of SON/SRRM2 degradation in our speckle dTAG knockin cell line. These experiments confirm that the SON Cut&Tag signal we see is SON-dependent. If the signal was truly due to artifactual binding, gained peaks would be open irrespective of speckle binding, however we see a clear speckle dependence as this signal is much lower if SON is degraded.

      Author response image 5.

      Moreover, in our original Cut&Tag experiments, we did not enrich detectable DNA without using the SON antibody (see last 4 samples-IgG controls). This further suggests that our signal is SON-dependent.

      Author response image 6.

      Finally, we see good agreement between Cut&Tag and TSA-seq (Spearman R=0.82).  The agreement is particularly strong in the top quadrant, which is most relevant since this is where the non-zero signal is.

      Author response image 7.

      Minor points

      ⁃ Why are HCT116 cells more responsive to treatment than K562 cells? This is something that could be addressed with DNA methylation analysis, for example

      K562 is a broadly hypomethylated cell line (Siegenfeld et.al, 2022 https://doi.org/10.1038/s41467-022-31857-5 Fig S2A-C). Thus, there may be less dynamic range to lose methylation compared to HCT116.

      Our results are also consistent with previous results comparing DKO HCT116 and aza-treated K562 cells (Maurano 2015, http://dx.doi.org/10.1016/j.celrep.2015.07.024). They state “In K562 cells, 5-aza-CdR treatment resulted in weaker reactivation than in DKO cells…”  In addition, cell-type-specific responsiveness to DNA methyltransferase KO depending upon global CpG methylation levels, has also been observed in ES and EpiLC cells (Monteagudo-Sanchez et al., 2024), which we now comment on in the manuscript.

      ⁃ How many significant CTCF loops in DNMTi, compared to DMSO? It was unclear what the difference in raw totals is.

      We now include a supplemental table with the HiChIP loop information. We call similar numbers of raw loops comparing DNMT1i and DMSO, as only a small subset of loops is changing.

      ⁃ For the architectural stripes, it would be nice to see a representative example in the form of a contact plot. Is that possible to do with the hiChIP data?

      As described in our methods, we called architectural stripes using Stripenn (Yoon et al 2022) from LIMe-Hi-C data under DNMT1i conditions (Siegenfeld et al, 2022). Shown below is a representative example of a stripe in the form of a Hi-C contact map.

      Author response image 8.

      ⁃ Here 4-10x more DNMT1i-specific CTCF binding sites were observed than we saw in our study. What are thresholds? Could the thresholds for DNMT1i-specific peaks be defined more clearly? For what it's worth, we defined our DNMT KO-specific peaks as fold-change {greater than or equal to} 2, adjusted P< 0.05. The scatterplots (1B) indicate a lot of "small" peaks being called "reactivated."

      We called DNMT1i-specific peaks using HOMER getDifferentialPeaksReplicates function. We used foldchange >2 and padj <0.05. We further restricted these peaks to those that were not called in the DMSO condition. 

      ⁃ On this note, is "reactivated" the proper term? Reactivated with regards to what? A prior cell state? I think DNMT1i-specific is a safer descriptor.

      We chose this term based on prior literature (Maurano 2015 http://dx.doi.org/10.1016/j.celrep.2015.07.024, Spracklin 2023 https://doi.org/10.1038/s41594-022-00892-7) . However, we agree it is not very clear, so we’ve altered the text to say “DNMT1i-specific”. We thank the reviewer for suggesting this improved terminology.

      ⁃ It appears there is a relatively small enrichment for CTCF peaks (of any class) in intergenic regions. How were intergenic regions defined? For us, it is virtually half of the genome. We did some enrichment of DNMT KO-specific peaks in gene bodies (our Supplemental Figure 1C), but a substantial proportion were still intergenic.

      We defined intergenic peaks using HOMER’s annotatepeaks function, with the -gtf option using Ensembl gene annotations (v104). We used the standard annotatepeaks priority order, which is TSS > TTS> CDS Exons > 5’UTR exons >3’ UTR exons > Introns > Intergenic.

      Maurano et. al. 2015 (http://dx.doi.org/10.1016/j.celrep.2015.07.024) also found reduced representation of intergenic sites among demethylation-reactivated CTCF sites in their Fig S5A. We note this is not a perfect comparison because their data is displayed as a fraction of all intergenic peaks.

      ⁃ We also recently published a review on this subject: The impact of DNA methylation on CTCF-mediated 3D genome organization NSMB 2024 (PMID: 38499830) which could be cited if the authors choose.

      We have cited this relevant review.

    1. eLife Assessment

      This important study investigates changes in oscillatory activity across cortical and subcortical areas during stroke recovery in a nonhuman primate model. The authors distinguish between global and local oscillatory bursts, providing solid evidence that these two types of bursts correlate with distinct aspects of movement; additionally, they show that the likelihood of these bursts occurring follows opposing trends during recovery. The study could be further improved by accounting for inter-individual differences and by some technical improvements, such as employing more robust burst detection methods and more stringent analyses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spike-LFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      Langford, Z.D., Wilson, C.R.E., 2025. Simulations reveal that beta burst detection may inappropriately characterize the beta band. https://doi.org/10.1101/2023.12.15.571838.

      Rayson, H., Szul, M.J., El-Khoueiry, P., Debnath, R., Gautier-Martins, M., Ferrari, P.F., Fox, N., Bonaiuto, J.J., 2023. Bursting with potential: How sensorimotor beta bursts develop from infancy to adulthood. J. Neurosci. https://doi.org/10.1523/JNEUROSCI.0886-23.2023.

      Szul, M.J., Papadopoulos, S., Alavizadeh, S., Daligaut, S., Schwartz, D., Mattout, J., Bonaiuto, J.J., 2023. Diverse beta burst waveform motifs characterize movement-related cortical dynamics. Prog. Neurobiol. 228, 102490.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a well-designed, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery? While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

    4. Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      We thank the reviewer for their assessment that the manuscript proves compelling evidence for distinct roles of local and global beta bursts on motor control and recovery.  

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spikeLFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      We thank the reviewer for recognizing the strengths of this manuscript. 

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. 

      We thank the reviewer for bringing up these methodological details. We plan to conduct a follow-up analysis using alternative burst detection methods to verify that the paper’s main results hold when using different burst detection methodologies. We anticipate this will improve confidence in our results. 

      Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. 

      We thank the reviewer for this comment. First, because the different animals had subcortical recording sites in different locations, we hesitate to use subcortical activity in the classification of bursts since we were not sure we would be identifying the same burst-phenomenon (e.g. thalamo-cortical bursts vs. capsule-cortical bursts may differ). Second, we believe that having a cortical-only criteria allows the designation of local vs. global bursts to be more widely applied in preparations that only have access to cortical data (e.g. surface ECoG recordings, EEG, Utah array recordings). Thus, in this study we chose to analyze the subcortical data post-hoc (after burst detection and classification) to support our “global” vs. “local” designation of burst types 

      Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. 

      We thank the reviewer for this comment. In our study’s stroke animals, we chose to study PMv due to its role in compensating for damage to M1, thus we hesitate to make any comparisons between PMv (which was recorded in stroke animals) and M1 (recorded in healthy unimpaired animals). Furthermore, animals are doing different tasks (e.g. reaching vs. reaching and grasping) which may also influence the spatial distribution. We agree that future work should certainly investigate the spatial distribution of global vs. local beta bursts across areas of sensorimotor cortex and subcortex, and that this comparison would be best done in healthy animals with both reaching and grasping behaviors.  

      Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Thank you for these references! We will review them and incorporate them into our discussion of our results. 

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      We thank the reviewers for their assessment that our work will likely have a substantial impact on the field of motor systems neuroscience. 

      Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      We thank the reviewer for their comments on the strengths of this paper.  

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a welldesigned, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      We thank the reviewer for these comments. Below we address some of these in more detail: 

      Different subcortical implant locations: We would like to clarify that the subcortical recordings were only used to confirm that global beta bursts (as characterized by cortical recordings alone) did indeed occur on subcortical sites coincidentally with cortical site more frequently than local beta bursts. Neither the beta burst categories nor the beta bursts themselves were influenced by the subcortical recordings.  

      Different injury outcomes: There is difficulty in creating strokes that result in identical deficits across animal as we and others have noted in previous work[1.3]. As a field, we are still understanding what factors give rise to variability in recovery curves. For example, one recent study noted that biological sex is a factor in predicting differences in recovery rates[4], and another noted that baseline white matter hyperintensities is also predictive of post-stroke recovery [5]. Overall, our methodology that creates structurally-consistent lesions can still result in very different functional outcomes depending on a variety of factors. Given this state of the field, we have done our best to match the recovery curves between our two animals, especially the initial recovery curves before Monkey H’s secondary decline. 

      Differences in ability to record at different times: We note this as a strength. One concern with these studies that induce stroke at the same time as implanting electrode arrays is that it is well appreciated that single-unit neuron yield right after array implantation is low and then improves in the following weeks [6]. There is always that concern that having more units later in recovery may drive results, but in this case, since one animal showed the opposite trend we are more confident that results are not driven by increases in unit-yield. We also note that we broadly see similar unit quality metrics in the early and late stages in both animals (Fig. S7).  

      Breaking continuous recovery curve into early and late: We note that this division was only made for one main analysis in the paper (Fig. 5CD): assessment of mean firing and variance of single-unit firing rates.  Without this split our analyses would be underpowered and inconclusive, thus we would not be able to provide any comment on how firing rates change, even coarsely, with recovery. 

      Presentation of data from M1 of healthy animals doing a different task: We agree that the strongest data would be longitudinally recorded from the same animals/brain areas pre-stroke and then post-stroke. However, we also view our inclusion of separate healthy animals doing a different task as evidence that our global vs. local segregation of beta bursts generalizes beyond the reach-to-grasp task to reaching-only tasks.  

      Overall, we appreciate the reviewer pointing out these notes about our data. In some cases we do not think these notes are concerning, in others, we acknowledge that have done the best we can given the state of the neurophysiology stroke recovery field. 

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      We thank the reviewer for bringing this up. 

      Clarification of our stroke model methodology: We wish to highlight that when we create stroke, we first do surface vessel occlusion as the first step. This is designed to match true ischemic injury. After a waiting period, the injured tissue is then aspiration to reduce the effects of edema and secondary mass effect in the model. 

      Carmichael and Chesselet 2002: The rodent work cited did show differential effects of a suction ablation method (without any surface vessel occlusion first) versus an ischemic method. The effects observed in this work were in the first 5 days following stroke. In our case, we started recording on day 7 and examined recovery over extended periods (weeks to months). 

      Effects of acute insult on rehabilitation: From a rehabilitation perspective, it remains unclear how the acute insult affects outcomes weeks and months later. One line of evidence to suggest that the manner that the acute insult occurs may not matter for rehabilitation is the observation that one therapeutic approach (vagus nerve stimulation) has been found to successfully improve rehabilitation outcomes in a range of injury models (intracranial hemorrhage, stroke, spinal cord injury). We agree that additional work is required in this area.

      Human stroke data shows similar results reported: Lastly, we note that neurophysiology performed in humans with clinical strokes supports the results we seek here (e.g.[7], see discussion section for full elaboration) suggesting that our stroke model methodology is similar enough to clinical stroke to result in similar results. 

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery?  

      We think these are all wonderful questions that could be addressed in follow-up studies! 

      While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

      We agree that this is a study that should be treated as a starting point for further investigation. 

      Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      We thank the reviewer for their comments. We note that we have chosen to be particularly conservative with which channels we considered noise-free and acceptable for analysis as our animals were not head-posted (see methods: “On each day, trials were manually inspected alongside camera data for any movement or chewing artifacts (note that animals were not head-posted) and were discarded from neural data analysis if there were any artifacts”). After re-visiting our analysis, we note that the data shown in Fig. 3 (spatial distribution of local bursts) is not representative from a data quality perspective – this data was from a session that had a particularly large number of channels discarded due to artifacts. We plan to correct this to show a more representative figure. 

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      We thank the reviewer for this comment and for their numerous suggestions in the notes to the authors. We plan to address as many of these as we can to improve clarity and comprehension.  

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

      We thank the reviewer for this suggestion – we plan to include this in our revisions.  

      References cited in response to public reviewer comments: 

      (1) Ganguly, K., Khanna, P., Morecraft, R. J. & Lin, D. J. Modulation of neural co-firing to enhance network transmission and improve motor function after stroke. Neuron 110, 2363–2385 (2022).

      (2) Khanna, P. et al. Low-frequency stimulation enhances ensemble co-firing and dexterity after stroke. Cell 184, 912-930.e20 (2021).

      (3) Darling, W. G. et al. Sensorimotor Cortex Injury Effects on Recovery of Contralesional Dexterous Movements in Macaca mulatta. Exp Neurol 281, 37–52 (2016).

      (4) Bottenfield, K. R. et al. Sex differences in recovery of motor function in a rhesus monkey model of cortical injury. Biology of Sex Differences 12, 54 (2021).

      (5) Schwarz, A. et al. Association that Neuroimaging and Clinical Measures Have with Change in Arm Impairment in a Phase 3 Stroke Recovery Trial. Ann Neurol 97, 709– 719 (2025).

      (6) Gulati, T. et al. Robust Neuroprosthetic Control from the Stroke Perilesional Cortex. J. Neurosci. 35, 8653–8661 (2015).

      (7) Silberstein, P. et al. Cortico-cortical coupling in Parkinson’s disease and its modulation by therapy. Brain 128, 1277–1291 (2005).

    1. eLife Assessment

      This manuscript describes solid and very interesting findings that substantially advance our understanding of a major research question on the role of Cx32 hemichannels in the Schwann cell paranode. It provides an interdisciplinary integration of imaging, in silico approaches, and functional data. This important study proposes a new mechanism with profound physiological relevance and provides new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

    2. Reviewer #1 (Public review):

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO₂-sensitive gating of connexins, this study proposes that mitochondrial CO₂ production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves.

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions.

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation.

    3. Reviewer #2 (Public review):

      Summary:

      This article aims to demonstrate that local production of CO₂ at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO₂ diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO₂-dependent Cx32 activation mediates activity-dependent Ca²⁺ influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity.

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions.

      Strengths:

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article.

      Weaknesses:

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO₂ production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO₂ production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation.

    4. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO₂-sensitive gating of connexins, this study proposes that mitochondrial CO₂ production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves. 

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves. 

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions. 

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation. 

      We thank the reviewer for their comments and agree that the evidence for involvement of Cx32 is indirect. We are planning to perform genetic manipulations to strengthen this link. We shall review our presentation of the morphology in terms of the node/paranode/juxtaparanode distribution and adjust accordingly. We have in the interim generated new data using GCaMP transduced into Schwann cells that provides the live-tissue imaging that the reviewer requests. This strengthens our conclusions, and we will add these data into the paper.

      Reviewer #2 (Public review): 

      Summary: 

      This article aims to demonstrate that local production of CO₂ at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO₂ diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO₂-dependent Cx32 activation mediates activity-dependent Ca²⁺ influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity. 

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions. 

      Strengths: 

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article. 

      Weaknesses: 

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO₂ production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO₂ production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation. 

      We thank the reviewer for their comments and agree that we do not have direct evidence for the site of CO2 production or the site of activation of Cx32 hemichannels. This direct evidence is extremely difficult to obtain, and we therefore depend on indirect arguments. Mitochondria represent the major source of CO2, and their distribution will therefore indicate where CO2 is likely to be produced. We agree that this is not essential to the interpretation of the data and will adjust the text as recommended. We will add a section to the Discussion to consider this point in more detail.

    1. eLife Assessment

      The article presents important findings of a dissociation between phasic and tonic pain functions in adaptive behavior, combining immersive VR, computational modeling, skin conductance, and EEG data. The methodology used is solid. Its ecological design and sophisticated computational modeling are major strengths. The article would benefit from adding details on hypotheses, VR implementation, sample size determination, modeling, analysis, and pain specificity.

    2. Reviewer #1 (Public review):

      Summary:

      This article presents a study consisting of two experiments, which aim to dissociate and quantify the distinct motivational functions of phasic and tonic pain within a naturalistic and immersive VR setting. Specifically, the authors test two hypotheses: (i) that phasic pain acts as a punishment signal that drives avoidance learning; (ii) that tonic pain reduces motivational vigor, promoting energy conservation and recuperation. In both experiments, participants performed a free-operant foraging task, where they collected virtual pineapples to earn points.

      In Experiment 1, phasic pain was delivered as a brief electric shock to the grasping hand when picking up green pineapples. As phasic pain intensity increased, participants were less likely to choose painful fruits. A reinforcement learning model that incorporated reward, pain cost, and effort cost was able to successfully capture behavior.

      Experiment 2 combined the effects of phasic and tonic pain. Tonic pain was induced by a pressure cuff on the non-dominant arm, simulating sustained discomfort. Interestingly, tonic pain did not affect the perceived intensity or avoidance of phasic pain. However, it significantly reduced movement velocity and pineapple collection rate, interpreted as a reduction of motivational vigor. A temporal decision model incorporating vigor cost successfully captured these effects.

      Concomitant EEG recordings showed that tonic pain was associated with reduced alpha and beta power in parietal and temporal areas. Phasic pain ratings and decision values distinctively correlated with skin conductance responses.

      Overall, these findings indicate that phasic and tonic pain have distinct and dissociable motivational effects.

      Strengths:

      This is an ambitious study that provides a quantitative dissociation of the roles of phasic and tonic pain in adaptive behavior, by integrating ecological neuroscience, motivational theory, and computational modeling. The use of immersive VR combined with a free-operant foraging task offers a more ecologically valid context to study pain-related behavior compared to traditional paradigms. Furthermore, the study employs a multimodal approach by combining behavioral data, computational frameworks, physiological signals, and EEG. In particular, one of the main strengths of the study is the use of sophisticated computational modeling to capture phasic and tonic pain effects. The experiment codes are available on GitHub, increasing reproducibility.

      Weaknesses:

      The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments. A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.

      In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigated the distinct roles of phasic and tonic pain in adaptive behavior. Phasic pain was proposed to function as a teaching signal, promoting avoidance of further injury, while tonic pain was hypothesized to support recuperative behavior by reducing motivational vigor. This hypothesis was tested using an immersive virtual reality (VR) EEG foraging task, in which participants harvested fruit in a forest environment. Some fruits triggered brief phasic pain to the grasping hand, which in turn reduced the likelihood of choosing those fruits. Concurrently, tonic pressure pain applied to the contralateral upper arm was associated with reduced action velocities. The authors employed a free-operant computational framework to quantify how phasic and tonic pain modulate motivational vigor and decision value. Importantly, model parameters were found to correlate with EEG responses, providing neurophysiological support for the hypothesized functional distinctions.

      Strengths:

      Overall, this study aims to address an important topic and is generally well written.

      Weaknesses:

      Two critical issues require clarification or justification.

      First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.

      Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates how phasic and tonic pain modulate behaviour in a free-operant foraging paradigm. The authors apply a computational modeling approach to the behavioural data to quantify the decision value of phasic pain, as well as the degree to which tonic pain reduces motivational vigour. EEG assessments showed, e.g., reduced signal power at alpha and beta frequencies in tonic pain conditions compared to no-tonic-pain conditions, but no association between these neural measures and motivational vigour. The authors conclude that tonic and phasic pain serve different motivational functions, with phasic pain acting as a punishment signal promoting avoidance and tonic pain reducing motivational vigour.

      Strengths:

      The experimental paradigm is highly innovative. Assessing human behaviour in a naturalistic yet highly controlled setting represents a promising approach to pain research. Notably, assessing pain magnitude implicitly, via its motivational value, offers insights about the overall pain experience that are not usually accessible via common pain ratings.

      Weaknesses:

      Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.

    5. Author response:

      Reviewer #1 (Public review):

      The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments. A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.

      In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.

      We appreciate the reviewer's comments regarding the insufficient implementation details. We hope the newly uploaded software for reproducing the experiment can improve the reader's understanding of the task. In addition to making the software available, we will expand the Methods section in the revised manuscript to include greater detail on the task description.

      The hypothesised functions of phasic and tonic pain, and their collaborative interaction, are both broad and deep topics. In the revised manuscript, we will more explicitly formulate our hypotheses and clarify the distinction between a priori predictions and exploratory analyses, particularly concerning the extent to which our evidence supports these hypotheses.

      We agree that examining the potential role of attentional load on the interaction between tonic and phasic pain is an important area of future investigation. Addition of additional control conditions matched for attentional salience with additional experiments is possible but introduces other confounds related to their different qualities (e.g. a salient vibrotactile stimulus might invigorate behaviour): however more fundamentally, attentional processes are a core part of pain function, and should not necessarily be viewed as a confound (i.e. the way that pain mediates some of its core functional effects may directly be through its salient attentional nature) . This view is formalised in Wall and Melzack’s classical tripartite model of pain, and distinguishes pain from purely sensory systems such as somatosensation, vision and so on..

      Reviewer #2 (Public review):

      Two critical issues require clarification or justification. First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.

      We acknowledge the reviewer’s concern regarding the specificity of evoked potentials elicited by electrical stimulation. We agree that traditional SEPs—particularly those evoked by large surface electrodes—primarily reflect activation of non-nociceptive A-beta fibres and thus may not reliably index pain-specific processes or be modulated by tonic pain via descending nociceptive control. However, we would like to clarify that phasic pain was administered in the present study using small-diameter concentric ‘Wasp’ electrodes. These are comparable to intraepidermal electrodes shown to preferentially activate nociceptive A-delta fibres, thereby eliciting ERPs more closely associated with nociceptive processing rather than mixed somatosensory input [1, 2]. Accordingly, our ERP results demonstrated a reliable increase in N1-P2 amplitude with higher phasic pain intensity, suggesting that the evoked responses captured stimulus-evoked nociceptive processing.

      We acknowledge that these ERPs may still reflect mixed sensory processing and thus may not be fully modulated by tonic pain. Previous studies have shown that ERPs elicited by nociceptive electrical stimulation can be attenuated during tonic pain using cold-water immersion in CPM paradigms [3, 4]. However, these studies typically employ passive tasks, whereas our paradigm involved continuous voluntary behaviour during sustained tonic pressure pain. This difference in task context may engage distinct modulatory systems, possibly prioritising behavioural adaptation over sensory gating.

      We will revise the manuscript to acknowledge these factors and to encourage a more nuanced interpretation of the ERP findings in light of this literature.

      Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.

      We are grateful to the reviewer for this suggestion. In the current study, phasic pain was delivered to the grasping hand to generate a coherent, spatially congruent representation of virtual stimuli (painful fruit) and behavioural consequences (pain upon grasp). Delivering phasic pain stimuli to the contralateral hand would be incongruent with the task design and may alter the interpretation of the learning signal, which was central to our computational modelling framework. Similarly, tonic pain was not applied to the grasping hand to avoid interfering with motor control. Applying tonic pain to the grasping hand would make it extremely difficult for participants to effectively grasp the hand controller, thereby complicating the interpretation of behavioural and neural measures. We will discuss these issues in the revision. Therefore, while we agree that such manipulations could be informative for future studies, they were not the focus of the current investigation.

      Reviewer #3 (Public review):

      Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.

      We thank the reviewer for highlighting the need for clearer definitions and a more coherent presentation. In the revised manuscript, we will refine our definitions of key concepts and improve the presentation of hypothesised functions of phasic and tonic pain. As stated previously, we will clarify the extent to which our evidence supports these hypotheses. We also appreciate the feedback on our statistical analysis and computational modelling. We will address these points and provide the necessary clarifications and justifications in the revised manuscript.

    1. eLife Assessment

      This valuable study presents a mouse gastruloid system to generate successive waves of hematopoietic progenitors that in vivo would emerge during embryonic development. Although this newly revised manuscript has addressed some of the concerns raised during the first round of review, the study is still considered incomplete, as the claims are only partially supported. In particular, the claim of definitive wave hematopoietic progenitors being produced in the gastruloids, and their engraftment after transplantation, would benefit from further validation.

    2. Reviewer #1 (Public review):

      Summary

      This manuscript describes a haemogenic gastruloid system that the authors claim recapitulates early mouse embryonic development to produce sequential waves of yolk sac and AGM-like haematopoiesis, with spatial and temporal accuracy. The model claims to reproduce mouse development to 'beyond' the E9.0 stage and apply its use to the aetiology of infant leukaemia.

      Strengths

      Gastruloids models are useful systems for studying early embryonic development, recapitulating aspects of gastrulation, anteroposterior regionalisation and somitogenesis. Gastruloid models that specifically mimic particular regions of the embryo could provide insights into how these regions form during development.

      Weaknesses

      There are a couple of major issues with this manuscript that I feel need to be addressed.

      Firstly, the authors acknowledge that the proportion of blood cells that are produced by their haemogenic gastruloid system is very low - there are fewer than 2% of either blood or endothelium produced. The authors argue however, that this is because they have developed a hematopoietic organoid that captures much more of the essence of the developing embryo and therefore has a broader tissue representation and a more relevant spatial representation.

      In order to prosecute this argument, this reviewer needs to understand how the differentiation protocol achieves this end, ie what is notable about the combination of factors and other media components. Also, they need to know what the evidence is to support this claim, in other words, what are the tissues that make up the organoid and is it truly representative of what would be expected in a developing embryo over this time. Does it pass from epiblast to primitive streak and then to cells of the germ layers? And how do haemGXs at different times map onto the developing mouse embryo?

      Secondly, the point is repeatedly made by the authors that the distinction between non-engrafting yolk sac hematopoiesis and AGM-like hematopoiesis from which repopulating HSCs first derive is not really possible without spatial cues. This is really not true. It has been shown by a number of investigators, and summarised in a recent review (Abuhantash et al 2021), that the expression of HOXA cluster genes - most prominently HOXA9 - clearly distinguishes AGM-derived, from yolk sac derived cells. In this manner, it is evident from the UMAP provided that the is no HOXA9 expressed in either endothelium or blood cells. This argues very strongly against the proposition that AGM-type hematopoiesis is generated. Indeed, given the duration of the organoid culture of only 9 days (216hrs), it would be highly unlikely that development would even reach the stage of AGM hematopoiesis (E11.5 in the mouse), even with a 1:1 concordance between embryonic time and in vitro differentiation. Finally, if there is recapitulation of the normal pattern of embryogenesis, it would be expected that there would be a prominent phase of yolk sac hematopoiesis antedating AGM-associated hematopoiesis, which should be observed in the haemGx.

      I feel that these are major conceptual points that need to be addressed in this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the development of a new hemogenic gastruloid (hGX) system, which they claim recapitulates the sequential generation of various hematopoietic cell types. A key proposed advantage of this system is its ability to more faithfully model the spatiotemporal emergence of hematopoietic progenitors within a physiologically relevant niche, as compared to existing in vitro platforms. While the authors provide some initial characterisation and demonstrate the utility of the system in studying infant leukemia, the presented data are not fully conclusive and fall short of robustly supporting several of their key assertions.

      Strengths:

      The development of a novel in vitro system to model hematopoietic development is innovative and could potentially address important limitations of existing platforms.

      Weaknesses:

      The characterization of the hematopoietic progenitors generated by the hGX system is not fully convincing. The evidence supporting the emergence of late yolk sac (YS) progenitors, including lymphoid cells, and AGM-like pre-hematopoietic stem cells (pre-HSCs), is incomplete and relies heavily on transcriptomic profiling and a limited set of markers.

      The identification of lymphoid or pre-HSC-like populations is primarily inferred from scRNA-seq data. The lack of robust functional validation (e.g., lymphoid differentiation assays or long-term repopulation experiments) significantly weakens the manuscript's main claims.

      In the revised manuscript, the authors incorporate single-cell RNA-seq analyses indicating that their cells resemble AGM-derived endothelial-to-hematopoietic transition (EHT) populations. However, they do not test whether these cells might more closely resemble YS-derived EHT, which remains an unresolved and critical question. Additionally, the claim in line 263 that Cluster 8 (CD45⁺ cells at 192-216 h) expresses lymphoid markers is not clearly supported by the provided supplemental data (Supplemental File S1-S2).

      While the authors respond that they did not claim to generate bona fide HSCs, they do state at the end of the Introduction (lines 116-121) that their system captures AGM hematopoiesis. The current data do not support this conclusion and instead suggest that the system recapitulates the generation of multipotent lymphoid progenitors (MLPs) akin to those found in the YS.

      The engraftment data presented are not particularly convincing. It is unclear why the analysis was terminated at 8 weeks post-transplant, especially given that a minimum of 12 weeks is generally required to meaningfully assess the presence of pre-HSCs or bona fide HSCs with long-term repopulating potential.

      Given the uncertainties discussed above, the interpretability of the MNX1 overexpression study is limited.

      The authors could have more directly tested their claim of capturing multiple hematopoietic waves by performing kinetic analyses of colony-forming potential, with the expectation that more multipotent colonies would emerge at later time points. Additionally, isolating and characterizing the potential of hemogenic endothelium at different time points corresponding to the putative waves would have strengthened their conclusions. In the absence of such data, it remains unclear whether the system recapitulates sequential waves of hematopoiesis or merely reflects the progressive maturation of cells originating from a single wave.

    4. Reviewer #3 (Public review):

      The authors present a revised version of their manuscript (Ragusa et al.) describing a hemogenic gastruloid (haemGx) model, used to investigate stages of blood production in vitro and for modeling a rare type of infant leukemia. The revisions address several major concerns raised during the initial round of review, and new data have been provided that overall improve the clarity and rigour of the study. In particular, the additional flow cytometry, single-cell RNA-seq analyses, and benchmarking against in vivo datasets help, to some extent, to substantiate the claims of developmental relevance of haemGx to yolk sac (YS)- and AGM-like hematopoietic waves. Nonetheless, some issues remain, particularly regarding the claims of short-term engraftment, novelty of the model, and the extent to which AGM-like HSPC are truly captured.

      Major Points:

      (1) The authors have clarified the novelty of their haemGx protocol relative to existing gastruloid models, including the importance of the Activin A pulse and protocol extension to 216h. Flow cytometry and scRNA-seq analyses support the emergence of endothelial and hematopoietic populations with dynamic marker expression. However, direct side-by-side comparisons with previously published protocols (e.g., Rossi et al., 2022) remain limited. The claim of "spatio-temporal accuracy" should be more cautiously phrased.

      (2) The characterization of the identity of the hematopoietic waves generated in the haemGx system has been improved in the revised manuscript. Flow cytometry analysis now includes CD31/CD34 co-expression in CD41+ and CD45+ subsets, and scRNA-seq re-clustering supports two hematopoietic waves with distinct marker sets (e.g., Gata2/Myb vs. Hoxa9/Ikzf1). Projection onto multiple embryonic reference datasets (Hou et al., Zhu et al., Thambyrajah et al.) is a valuable addition. The case for YS-like EMP and AGM-like HSPC precursors is reasonably made, though further functional distinctions (e.g., lineage output differences) would strengthen the claims.

      (3) The authors have now provided additional evidence for low-level engraftment following adrenal implantation of whole haemGx. Although technically demanding, this in vivo result remains marginal and should be interpreted with caution. Crucially, this still does not demonstrate HSC-level repopulation capacity. The revised manuscript has softened the claims accordingly, now referring to "progenitor" activity rather than "pre-HSC." We agree that this adjusted claim is more suitable, though the reproducibility of this experiment is still unclear.

      (4) The MNX1 overexpression experiments are generally convincing in showing early expansion of a putative HE-to-EMP-like population and transcriptional resemblance to MNX1-r AML. However, the evidence for transformation is still solely based on in vitro data and lacks any evidence of in vivo leukaemia engraftment. The ability to perturb the system would add translational value to the haemGx platform, although future studies are needed to better define transformation dynamics and leukemogenic progression.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary 

      The authors describe a method for gastruloid formation using mouse embryonic stem cells (mESCs) to study YS and AGM-like hematopoietic differentiation. They characterise the gastruloids during nine days of differentiation using a number of techniques including flow cytometry and single-cell RNA sequencing. They compare their findings to a published data set derived from E10-11.5 mouse AGM. At d9, gastruloids were transplanted under the adrenal gland capsule of immunocompromised mice to look for the development of cells capable of engrafting the mouse bone marrow. The authors then applied the gastruloid protocol to study overexpression of Mnx1 which causes infant AML in humans.

      In the introduction, the authors define their interpretation of the different waves of hematopoiesis that occur during development. 'The subsequent wave, known as definitive, produces: first, oligopotent erythro-myeloid progenitors (EMPs) in the YS (E8-E8.5); and later myelo-lymphoid progenitors (MLPs - E9.5-E10), multipotent progenitors (MPPs - E10-E11.5), and hematopoietic stem cells (HSCs - E10.5-E11.5), in the aorta-gonad-mesonephros (AGM) region of the embryo proper.' Herein they designate the yolk sac-derived wave of EMP hematopoiesis as definitive, according to convention, although paradoxically it does not develop from intra-embryonic mesoderm or give rise to HSCs.

      Our definition of primitive and definitive waves is widely used in the field (e.g. PMID: 18204427; PMID: 28299650; PMID: 33681211). Definitive haematopoiesis, encompassing EMP, MLP, MPP and HSC, highlights their origin from haemogenic endothelium, generation of mature cells with adult characteristics from progenitors with multilineage potential and direct and indirect developmental contributions to the intra-embryonic and time-restricted generation of HSCs. 

      General comments 

      The authors make the following claims in the paper: 

      (1) The development of a protocol for hemogenic gastruloids (hGx) that recapitulates YS and AGMlike waves of blood from HE.

      (2) The protocol recapitulates both YS and EMP-MPP embryonic blood development 'with spatial and temporal accuracy'.

      (3) The protocol generates HSC precursors capable of short-term engraftment in an adrenal niche.

      (4) Overexpression of MNX1 in hGx transforms YS EMP to 'recapitulate patient transcriptional signatures'.

      (5) hGx is a model to study normal and leukaemic embryonic hematopoiesis. 

      There are major concerns with the manuscript. The statements and claims made by the authors are not supported by the data presented, data is overinterpreted, and the conclusions cannot be justified. Furthermore, the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis. Specific points 

      (1) It is claimed that HGxs capture cellularity and topography of developmental blood formation. The hGx protocol described in the manuscript is a modification of a previously published gastruloid protocol (Rossi et al 2022). The rationale for the protocol modifications is not fully explained or justified. There is a lack of novelty in the presented protocol as the only modifications appear to be the inclusion of Activin A and an extension of the differentiation period from 7 to 9 days of culture. No direct comparison has been made between the two versions of gastruloid differentiation to justify the changes.

      The Reviewer paradoxically claims that the protocol is not novel and that it differs from a previous publication in at least 2 ways – the patterning pulse and the length of the protocol. Of these, the patterning pulse is key. As documented in Fig. 1S1, we cannot obtain Flk1-GFP expression in the absence of Activin A (Fig. 1S1A), and the concentration of Activin A scales activity of the Flk1 locus (Fig. 1S1B). Expression of Flk1 is a fundamental step in haemato-endothelial specification and, accordingly, we do not see CD41 or CD45+ cells in the absence of Activin A. Furthermore, these markers also titrate with the dose of Activin A (in Fig. 1S1B).

      Also, in our hands, there is a clear time-dependent progression of marker expression, with sequential acquisition of CD41 and CD45, with the latter not detectable until 192h (Fig. 1C-D), another key difference relative to the Rossi et al (2022) protocol. We suggest, and present further evidence for in this rebuttal and the revised manuscript, that the 192h-timepoint captures the onset of AGM-like haematopoiesis. We have edited the manuscript to clarify the differences and novelty in our protocol (lines 132-143) and provided a more detailed comparison with the report from Rossi et al. (2022) in the Discussion (lines 574-586).

      The inclusion of Activin A at high concentration at the beginning of differentiation would be expected to pattern endoderm rather than mesoderm. BMP signaling is required to induce Flk1+ mesoderm, even in the presence of Wnt.

      Again, we call the Reviewer’s attention to Fig. 1S1A which clearly shows that Activin A (with no BMP added) is required for induction of Flk1 expression, in the presence of Wnt. Activin A in combination with Wnt, is used in other protocols of haemato-endothelial differentiation from pluripotent cells, with no BMP added in the same step of patterning and differentiation (PMID: 39227582; PMID: 39223325). In the latter protocol, we also call the Reviewer’s attention to the fact that a higher concentration of Activin A precludes the need for BMP4 addition. Finally, one of us has recently reported that Activin A, on its own, will induce Flk1, as well as other anterior mesodermal progenitors (https://www.biorxiv.org/content/10.1101/2025.01.11.632562v1). In addressing the Reviewer’s concerns with the dose of Activin A used, we titrated its concentration against activation of Flk1, confirming optimal Flk1-GFP expression at the 100ng/ml dose used in the manuscript. We have included this data in the manuscript in Figure 1S1B.                         

      FACS analysis of the hGx during differentiation is needed to demonstrate the co-expression of Flk1GFP and lineage markers such as CD34 to indicate patterning of endothelium from Flk1+ mesoderm. The FACS plots in Fig. 1 show C-Kit expression but very little VE-cadherin which suggests that CD34 is not induced. Early endoderm expresses C-Kit, CXCR4, and Epcam, but not CD34 which could account for the lack of vascular structures within the hGx as shown in Fig. 1E.

      We were surprised by the Reviewer’s comment that there are no endothelial structures in our haemogenic gastruloids. The presence of a Flk1-GFP+ network is visible in the GFP images in Fig. 1B, from 144h onwards, and is detailed in the revised Fig. 2A, which shows overlap between Flk1GFP and the endothelial marker CD31. In addition, our single-cell RNA-seq data, included in the manuscript, confirms the presence of endothelial cells with a developing endothelial, including arterial, programme. This is now presented in the revised Fig. 3B-D of the manuscript, which updates a representation in the original manuscript. In contrast with the Reviewer’s claims that no endothelial cells are formed, the data show that Kdr (Flk1)+ cells co-express Cdh5/VE-Cadherin and indeed Cd34, attesting to the presence of an endothelial programme. Arterial markers Efnb2, Flt1, and Dll4 are present. A full-blown programme, which also includes haemogenic markers including Sox17, Esam, Cd44 and Mecom is clear at early (144h) and, particularly at late (192h) timepoints in cells sorted on detection of surface C-Kit (Fig. 3B-E in the manuscript). To address the specific point by the Reviewer, we also document co-expression of Flk1-GFP, CD34 and/or CD31 by flow cytometry (Fig. 2S1A-B in the revised manuscript).

      To summarise new and revised data in the manuscript in relation to this point:

      Immunofluorescence staining showing the Flk1-GFP-defined vascular network in Figure 1E and co-expression of endothelial marker CD31 in Figure 2A. In text: lines 159-163; 178-180.

      Flow cytometry analysis of co-expression of Flk1-GFP with CD31 and CD34 in Figure 2S1AD, including controls. In text: 180-187.

      Real-time quantitative (q)PCR analysis showing time-dependent expression of haematoendothelial and arterial markers in Figure 2F (specifically Dll4 and Mecom). In text: 200-209.

      An improved representation of our scRNA-seq data highlighting key haemato-endothelial markers in Figure 3B-D. In text: 268-304

      (2) The protocol has been incompletely characterised, and the authors have not shown how they can distinguish between either wave of Yolk Sac (YS) hematopoiesis (primitive erythroid/macrophage and erythro-myeloid EMP) or between YS and intraembryonic Aorta-Gonad-Mesonephros (AGM) hematopoiesis. No evidence of germ layer specification has been presented to confirm gastruloid formation, organisation, and functional ability to mimic early development. Furthermore, differentiation of YS primitive and YS EMP stages of development in vitro should result in the efficient generation of CD34+ endothelial and hematopoietic cells. There is no flow cytometry analysis showing the kinetics of CD34 cell generation during differentiation. Benchmarking the hGx against developing mouse YS and embryo data sets would be an important verification. 

      The Reviewer is correct that we have not provided detailed characterisation of the different germ layers, as this was not the focus of the study. In that context, we were surprised by the earlier comment assuming co-expression of C-Kit, Cxcr4 and Epcam, which we did not show, while overlooking the endothelial programme reiterated above, which we have presented. Given our focus on haemato-endothelial specification, we have started the single-cell RNA-seq characterisation of the haemogenic gastruloid at 120h and have not looked specifically at earlier timepoints of embryo patterning. This said, we show the presence of neuroectodermal cells in cluster 9; on the other hand, cluster 7 includes hepatoblast-like cells, denoting endodermal specification (Supplementary File S2). However, in the absence of earlier timepoints and given the bias towards mesodermal specification, we expect that specification of ectodermal and endodermal programmes may be incomplete. 

      In respect of the contention regarding the capture of YS-like and AGM-like haematopoiesis, we had presented evidence in the original version of the manuscript that haemogenic cells generated during gastruloid differentiation, particularly at late 192h and 216h timepoints project onto highly purified CKit+ CD31+ Gfi1-expressing cells from mouse AGM (PMID: 38383534), providing support for at least partial recapitulation of the corresponding developmental stage. These projections are represented in Fig. 4A, right and 4S1C of the revised manuscript. In distinguishing between YS-like and AGM-like haematopoiesis, we call the Reviewer’s attention to the replotting of the single-cell RNA-seq data already in the manuscript, which we provided in response to point 1 (Fig. 3B-D and 3S2B), which highlights an increase in Sox17, but not Sox18, expression in the 192h haemogenic endothelium, which suggests an association with AGM haematopoiesis (PMID: 20228271). A significant association of Cd44 and Procr expression with the same time-point (Fig. 3B-D in the manuscript), further supports an AGM-like endothelial-to-haematopoietic transition at the 192h timepoint. We have re-analysed the scRNA-seq data to better represent the expression of these markers in Fig. 3A-E and S32B. We agree that it remains challenging to identify markers exclusive to AGM haematopoiesis, which is operationally equated with generation of transplantable haematopoietic stem cells. While HSC generation is a key event characteristic of the AGM, not all AGM haematopoiesis corresponds to HSCs, an important point in evaluating the data presented in the manuscript, and one that is acknowledged by us. The main text has been edited to clarify the experiments pertaining to distinguishing AGM and YS haematopoiesis, which are detailed in lines 180-187, 200-221, 268-304, and 315-356.

      Following on the Reviewer’s comments about Cd34, we also inspected co-expression of Cd34 with Cd41 and Cd45, the latter co-expression present in, although not necessarily exclusive to, AGM haematopoiesis. Reassuringly, we observed clear co-expression with both markers (Author response image 1), in addition to a CD41+CD34- population, which likely reflects YS EMP-independent erythropoiesis. Flow cytometry analysis of co-expression of CD31 and CD34 in CD41+ and CD45+ populations at 144h and 216h timepoints has been included in Fig. 2B-D, Fig. 2S1A-D, including controls. In text: 180-187. We have earlier on in the rebuttal highlighted the fact that marker expression is responsive to the levels of Activin A used in the patterning pulse, with the 100ng/ml Activin A used in our protocol superior to 75ng/ml.

      Author response image 1.

      Association of CD34 with CD41 and CD45 expression is Activin A-responsive and supports the presence of definitive haematopoiesis. A. Flow cytometry analysis of CD34 and CD41 expression in 216h-haemogenic gastruloids; two doses of Activin A were used in the patterning pulse with CHI99021 between 48-72h. FMO controls shown. B. Flow cytometry analysis of CD34 and CD45 at 216h in the same experimental conditions.

      Given the centrality of this point in comments by all the Reviewers, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-tohaematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346). Focusing the analysis on the subsets of haemogenic gastruloid cells sorted as CD41+ (144h) C-Kit+ (144h and 192h) and CD45+ (192h and 216h) (now represented in Fig. 3A, and projected onto the studies in Fig. 4A), we show:

      (1) That a subset of haemato-endothelial cells from haemogenic gastruloids at 144h to 216h project onto intra-embryonic cells spanning E8.25 to E10 (revised Fig. 4A left and 4S1A). This is in agreement with our original interpretation that 216h are no later than the MPP/pre-HSC state of embryonic development, requiring further maturation to generate engrafting progenitors. We have nevertheless removed specific references to pre-HSC, and instead referred to HSPC/progenitors.

      (2) That haemogenic gastruloids contain YS-like (including EMP-like) and AGM-like haematopoietic cells (Fig. 4A centre and 4 S1B). Significantly, some of the cells, particularly CKit-sorted cells with a candidate endothelial and HE-like signature project onto AGM pre-HE and HE, as well as IAHC. Some 144h CD41+ and 192h CD45+ cells also project onto IAHC, suggesting that YS-like and AGM-like programmes arise independently and with partial timedependent organisation in the haemogenic gastruloid model. Later, predominantly 216h cells, have characteristics of MPP/LMPP-like cells from the FL, suggesting a progenitor wave of differentiation.

      Altogether, the data support the notion that haemogenic gastruloids capture YS and AGM haematopoiesis until E10, as suggested by us in the manuscript.This re-analysis of the scRNA-seq data which was indeed prompted by challenging and insightful comments from the Reviewers, has been incorporated in the manuscript as described above and further listed here:

      Re-clustering and highlights of specific markers in our scRNA-seq data in Figure 3A-E. In text: 268-304.

      Projections to mouse embryo datasets in Figure 4A (Figure 4S1A-C; Supplementary File 3). In text: 315-356. 

      Single-cell RNA sequencing was used to compare hGx with mouse AGM. The authors incorrectly conclude that ' ..specification of endothelial and HE cells in hGx follows with time-dependent developmental progression into putative AGM-like HE..' And, '...HE-projected hGx cells.......expressed Gata2 but not Runx1, Myb, or Gfi1b..' Hemogenic endothelium is defined by the expression of Runx1 and Gfli1b is downstream of Runx1.

      As a hierarchy of regulation, Gata2 precedes and drives Runx1 expression at the specification of HE (PMID: 17823307; PMID: 24297996), while Runx1 drives the EHT, upstream of Gfi1b in haematopoietic clusters (PMID: 34517413). Please note that the text segment the Reviewer refers to has been removed from the manuscript, as the analysis is no longer solely focused on projection to Thambyrajah et al (2024) data, and instead gained significantly from the projections on to the Hou et al (2020) and Zhu et al (2020) studies, as detailed above.

      (3) The hGx protocol 'generates hematopoietic SC precursors capable of short-term engraftment' is not supported by the data presented. Short-term engraftment would be confirmed by flow cytometric detection of hematopoietic cells within the recipient bone marrow, spleen, thymus, and peripheral blood that expressed the BFP transgene. This analysis was not provided. PCR detection of transcripts, following an unspecified number of amplification cycles, as shown in Figure 3G (incorrectly referred to as Figure 3F in the legend) is not acceptable evidence for engraftment.

      We provide the full flow cytometry analysis of spleen engraftment in the 5 mice which received implantation of 216h-haemogenic gastruloids in the adrenal gland and were analysed at 4 weeks; an additional (control) animal received adrenal injection of PBS (Fig. 4B-D in the revised manuscript). In this experiment, the bone marrow collection was limiting, and material was prioritised for PCR (Fig. 4C and full gels in 4S2C in the revised manuscript).

      We had previously provided only representative plots of flow cytometry analysis of bone marrow and spleen, which we described as low-level engraftment and were chosen conservatively. The analysis was meant to complement the genomic DNA PCR, where detection was present in only some of the replicates tested per animal. On this note, we confirm that PCR analysis used conventional 40 cycles; the sensitivity had already been shown in the earlier version of the manuscript and is again represented in Fig. 4S2B. We argue that the low level of cytometric and molecular engraftment at 4 weeks, from haemogenic gastruloid-derived progenitors that have not progressed beyond a stage equivalent to E10 (Fig. 4A and Supplementary File 3 in the revised manuscript from scRNAseq projections), and that we have described as requiring additional maturation in vivo, are not surprising. Indeed, as previously shown and now repeated in in Fig. 2B-E (controls in Fig. 2S1E-G) in the revised manuscript, no more than 7 CD45+CD144+ multipotent cells are present per haemogenic gastruloid. We are only able to implant 3 haemogenic gastruloids in the adrenal gland of each transplanted animal. 

      We have rephrased Results and Discussion in lines 359-415 and 588-621, respectively, to rectify the nature of the engraftment, which we now attribute more generically to progenitors, also in light of the developmental time we could capture in the gastruloids prior to implantation.

      Transplanted hGx formed teratoma-like structures, with hematopoietic cells present at the site of transplant only analysed histologically. Indeed, the quality of the images provided does not provide convincing validation that donor-derived hematopoietic cells were present in the grafts.

      As stated in the text, the images mean to illustrate that the haemogenic gastruloids developed in situ. Further analysis motivated by the Reviewers’ comments and indeed a subsequent experiment with analysis of engraftment at a later timepoint of 8 weeks (revised Fig. 4E and 4 S2F-G) did not show a direct correspondence between engraftment and in vivo development or expansion, although this occurs in some cases. To be clearer, the observation of donor-derived blood cells in the implanted haemogenic gastruloids would not correspond to engraftment, as we have amply demonstrated that they have generated blood cells in vitro. There is no evidence that there are remaining pluripotent cells in the haemogenic gastruloid after 9 days of differentiation, and it is therefore not clear that the structures observed are teratomas. We specifically comment on this point in the revised manuscript – lines 601-607.

      There is no justification for the authors' conclusion that '... the data suggest that 216h hGx generate AGM-like pre-HSC capable of at least short-term multilineage engraftment upon maturation...'. Indeed, this statement is in conflict with previous studies demonstrating that pre-HSCs in the dorsal aorta of the mouse embryo are immature and actually incapable of engraftment.

      We have clearly stated that we do not see haematopoietic engraftment through transplantation of dissociated haemogenic gastruloids, which reach the E10 state containing pre-HSC (revised Fig 4A, 4S1A and Supplementary File 3). Instead, we observed rare myelo-erythroid (revised Fig. 4S2F-G) and myelo-lymphoid (revised Fig. 4E) engraftment upon in vivo maturation of haemogenic gastruloids with preserved 3D organisation. These statements are not contradictory. Nevertheless, we have now more cautiously attributed engraftment to the present of progenitors as a generic designation, and not to pre-HSC (lines 412-414 and 588-592 in the revised manuscript).

      The statement '...low-level production of engrafting cells recapitulates their rarity in vivo, in agreement with the embryo-like qualities of the gastruloid system....' is incorrect. Firstly, no evidence has been provided to show the hGx has formed a dorsal aorta facsimile capable of generating cells with engrafting capacity. Secondly, although engrafting cells are rare in the AGM, approximately one per embryo, they are capable of robust and extensive engraftment upon transplantation.

      As indicated above, the statement in lines 412-414 now reads “Engraftment is erythromyeloid at 4 weeks and lympho-myeloid at 8 weeks, reflecting different classes of progenitors, putatively of YS-like and AGM-like affiliation.” To be clear, with our original statement we meant to highlight that the production of definitive AGM-like haematopoietic progenitors (not all of which are engrafting) in haemogenic gastruloids does not correspond to non-physiological single-lineage programming. We did and do not claim that we achieved production of HSC, which would be long-term engrafting.

      (4) Expression MNX1 transcript and protein in hematopoietic cells in MNX1 rearranged acute myeloid leukaemia (AML) is one cause of AML in infants. In the hGX model of this disease, Mnx1 is overexpressed in the mESCs that are used to form gastruloids. Mnx1 overexpression seems to confer an overall growth advantage on the hGx and increase the serial replating capacity of the small number of hematopoietic cells that are generated. The inefficiency with which the hGx model generates hematopoietic cells makes it difficult to model this disease. The poor quality of the cytospin images prevents accurate identification of cells. The statement that the kit-expressing cells represent leukemic blast cells is not sufficiently validated to support this conclusion. What other stem cell genes are expressed? Surface kit expression also marks mast cells, frequently seen in clonogenic assays of blood cells. Flow cytometric and gene expression analyses using known markers would be required.

      The haemogenic gastruloid model generates haematopoietic and haemato-endothelial cells. MNX1 expands C-Kit+ cells at 144h, which we show to have a haemato-endothelial signature (see revised Fig. 3A-E, Supplementary File 2). We have added additional flow cytometry data showing that the replating cells from MNX1 express CD31 (Figure 6S1A-B).

      Serial replating of CFC assays is a conventional in vitro assay of leukaemia transformation. Critically, colony replating is not maintained in EV control cells, attesting to the transformation potential of MNX1. Although we have not fully-traced the cellular hierarchy of MNX1-driven transformation in the haemogenic gastruloid system, the in vitro replating expands a C-Kit+ cell (revised Fig. 6E), which reflects the surface phenotype of the leukaemia, also recapitulated in the mouse model initiated by MNX1-overexpressing FL cells. Importantly, it recapitulates the transcriptional profile of MNX1leukaemia patients (revised Fig. 7C), which is uniquely expressed by MNX1144h and replated colony cells, but not to MNX1 216h gastruloid cells, arguing against a generic signature of MNX1 overexpression (revised Fig. 7B). Importantly, the MNX1-transformation of haemogenic gastruloid cells is superior to the FL leukaemia model at capturing the unique transcriptional features of MNX1-driven leukaemia, distinct from other forms of AML in the same age group (Fig 7 S1D-F). It is possible that this corresponds to a pre-leukaemia event, and we will explore this in future studies, which are beyond the proof-of-principle nature of this paper.

      (5) In human infant MNX1 AML, the mutation is thought to arise at the fetal liver stage of development. There is no evidence that this developmental stage is mimicked in the hGx model.

      We never claim that the haemogenic gastruloid model mimics the foetal liver. We propose that susceptibility to MNX1 is at the HE-to-EMP transition. Moreover, and importantly, contrary to the Reviewer’s statement, there is no evidence in the literature that the mutation arises in the foetal liver stage, just that the mutation arises before birth (PMID: 38806630), which is different. In a mouse model of MNX1 overexpression, the authors achieve leukaemia engraftment upon MNX1 overexpression in foetal liver, but not in bone marrow cells (PMID: 37317878). This is in agreement with a vulnerability of embryonic / foetal, but not adult cells to the MNX1 expression caused by the translocation. However, haematopoietic cells in the foetal liver originate from YS and AGM precursors, so the origin of the MNX1susceptible cells can be in those locations, rather than the foetal liver itself.

      Reviewer #2 (Public review):

      Summary: 

      In this manuscript, the authors develop an exciting new hemogenic gastruloid (hGX) system, which they claim reproduces the sequential generation of various blood cell types. The key advantage of this cellular system would be its potential to more accurately recapitulate the spatiotemporal emergence of hematopoietic progenitors within their physiological niche compared to other available in vitro systems. The authors present a large set of data and also validate their new system in the context of investigating infant leukemia. 

      Strengths: 

      The development of this new in vitro system for generating hematopoietic cells is innovative and addresses a significant drawback of current in vitro models. The authors present a substantial dataset to characterize this system, and they also validate its application in the context of investigating infant leukemia. 

      Weaknesses: 

      The thorough characterization and full demonstration that the cells produced truly represent distinct waves of hematopoietic progenitors are incomplete. The data presented to support the generation of late yolk sac (YS) progenitors, such as lymphoid cells, and aortic-gonad-mesonephros (AGM)-like progenitors, including pre-hematopoietic stem cells (pre-HSCs), by this system are not entirely convincing. Given that this is likely the manuscript's most crucial claim, it warrants further scrutiny and direct experimental validation. Ideally, the identity of these progenitors should be further demonstrated by directly assessing their ability to differentiate into lymphoid cells or fully functional HSCs. Instead, the authors primarily rely on scRNA-seq data and a very limited set of markers (e.g., Ikzf1 and Mllt3) to infer the identity and functionality of these cells. Many of these markers are shared among various types of blood progenitors, and only a well-defined combination of markers could offer some assurance of the lymphoid and pre-HSC nature of these cells, although this would still be limited in the absence of functional assays.

      The identification of a pre-HSC-like CD45⁺CD41⁻/lo C-Kit⁺VE-Cadherin⁺ cell population is presented as evidence supporting the generation of pre-HSCs by this system, but this claim is questionable. This FACS profile may also be present in progenitors generated in the yolk sac such as early erythromyeloid progenitors (EMPs). It is only within the AGM context, and in conjunction with further functional assays demonstrating the ability of these cells to differentiate into HSCs and contribute to long-term repopulation, that this profile could be strongly associated with pre-HSCs. In the absence of such data, the cells exhibiting this profile in the current system cannot be conclusively identified as true pre-HSCs.

      We present 2 additional pieces of evidence to support our claims that we capture YS and AGM stages of haematopoietic development.

      (I) In the new Figures 4A and 4 S1A-C and Supplementary File 3 in the revised manuscript, we project our single-cell RNA-seq data onto (1) developing intra-embryonic pSP and AGM between E8 and E11 (Fig. 4A left, 4S1A) and (2) a single-cell RNA-seq study of HE development which combines haemogenic and haematopoietic cells from the YS, the developing HE and IAHC in the AGM, and FL (Fig. 4A centre, 4S1B). Our data maps E8.25-E10, and captures YS EMP and erythroid and myeloid progenitors, as well as AGM pre-HE, HE and IAHC, with some cells matching HSPC and LMPP, as suggested by the projection onto the Thambyrajah et al data set (already presented in the previous version of the manuscript, and now in Fig. 4A right and 4 S1C). The projection of the scRNA-seq data in presented in lines 314-355 of the revised manuscript. The scRNA-seq data itself was refocused on haemato-endothelial programmes as presented in the revised Fig. 3A-E, described in lines 267-303.

      (II) Given the difficulty in finding markers that specifically associate with AGM haematopoiesis, we inspected the possibility of capturing different regulatory requirements at different stages of gastruloid development mirroring differential effects in the embryo. Polycomb EZH2 is specifically required for EMP differentiation in the YS, but does not affect AGM-derived haematopoiesis; it is also not required for primitive erythroid cells (PMID: 29555646; PMID: 34857757). We treated haemogenic gastruloids from 120h onwards with either DMSO (0.05%) or GSK126 (0.5uM), and inspected the cellularity of gastruloids at 144h, which we equate with YS-EMP, and 216h – putatively AGM haematopoiesis. We show that EZH2 inhibition / GSK126 treatment specifically reduces %CD41+ cells at 144h, but does not reduce %CD41+ or %CD45+ cells at 216h. We have included this experiment in the manuscript in Fig. 2 S2B-C (in text: 209-221).

      These data, together with the scRNA-seq projections described, provide evidence to our claim that 144h haemogenic gastruloids capture YS EMPs, while CD41+ and CD45+ cells isolated at 216h reflect AGM progenitors. We cannot conclude as to the functional nature of the AGM cells from this experiment. The main text has been edited to clarify the experiments pertaining to distinguishing AGM and YS haematopoiesis (lines 180-187; 200-221; 268-304; 315-356).

      The engraftment data presented are also not fully convincing, as the observed repopulation is very limited and evaluated only at 4 weeks post-transplantation. The cells detected after 4 weeks could represent the progeny of EMPs that have been shown to provide transient repopulation rather than true HSCs. 

      In the original version of the manuscript, we stated that there is low level engraftment and did not claim to have generated HSC. Instead, we described cells with short-term engraftment potential. We agree with the Reviewer that the cells we show in the manuscript at 4 weeks could be EMPs (revised Fig. 4B-E and 4 S2D-G). Additionally, we now have 8-week analysis of implant recipients, in which we observed, again low-level, a multi-lineage engraftment of the recipient bone marrow in 1:3 recipients (revised Fig. 4B-E and 4S2F-H). This engraftment is myeloid-lymphoid and therefore likely to have originated in a later progenitor. To be clear, we do not claim that this corresponds to the presence of HSC. It nevertheless supports the maturation of progenitors with engraftment potential. Limiting amounts of material was prioritised for flow cytometry stainings, not allowing PCR analysis. We rephrased Results and Discussion in lines 359-414 and 588-621, respectively, to rectify the nature of the engraftment.      

      Reviewer #3 (Public review):  

      In this study, the authors employ a mouse ES-derived "hemogenic gastruloid" model which they generated and which they claim to be able to deconvolute YS and AGM stages of blood production in vitro. This work could represent a valuable resource for the field. However, in general, I find the conclusions in this manuscript poorly supported by the data presented. Importantly, it isn't clear what exactly are the "YS" and the "AGM"-like stages identified in the culture and where is the data that backs up this claim. In my opinion, the data in this manuscript lack convincing evidence that can enable us to identify what kind of hematopoietic progenitor cells are generated in this system. Therefore, the statement that "our study has positioned the MNX1-OE target cell within the YS-EMP stage (line 540)" is not supported by the evidence presented in this study. Overall, the system seems to be very preliminary and requires further optimization before those claims can be made.

      Specific comments below: 

      (1) The flow cytometric analysis of gastruloids presented in Figure 1 C-D is puzzling. There is a large % of C-Kit+ cells generated, but few VE-Cad+ Kit+ double positive cells. Similarly, there are many CD41+ cells, but very few CD45+ cells, which one would expect to appear toward the end of the differentiation process if blood cells are actually generated. It would be useful to present this analysis as consecutive gating (i.e. evaluating CD41 and CD45 within VE-Cad+ Kit+ cells, especially if the authors think that the presence of VE-Cad+ Kit+ cells is suggestive of EHT). The quantification presented in D is misleading as the scale of each graph is different.

      Fig. 1C-D provide an overview of haemogenic markers during the timecourse of haemogenic gastruloid differentiation, and does indeed show a late up-regulation of CD45, as the Reviewer points out would be expected. The %CD45+ cells is indeed low. However, we should point out that the haemogenic gastruloid protocol, although biased towards mesodermal outputs, does not aim to achieve pure haematopoietic specification, but rather place it in its embryo-like context. We refute that the scale is misleading: it is a necessity to represent the data in a way that is interpretable by the reader: and we made sure from the outset that the gates (in C) are truly representative and annotated, as are the plot axes (in D). Consecutive gating at the 216h-timepoint is shown and quantified in Fig. 2S1D-F, or in the alternative consecutive gating suggested by the Reviewer, in Author response iamge 2 below. At the request of Reviewer 1, we also analysed CD31 and CD34 within CD41 and CD45 populations, again as validation of the emergent haematopoietic character of the cells obtained. This new analysis is shown in revised Fig. 2B, quantified in 2C.

      Author response image 2.

      Flow cytometry analysis of VE-cadherin+ cells in haemogenic gastruloids at 216h of the differentiation protocol, probing co-expression of CD45, CD41 and C-Kit.

      (2) The imaging presented in Figure 1E is very unconvincing. C-Kit and CD45 signals appear as speckles and not as membrane/cell surfaces as they should. This experiment should be repeated and nuclear stain (i.e. DAPI) should be included.

      We included the requested immunofluorescence staining in Figure 1E (216h). We also show the earlier timepoint of 192h here as Author response image 3. In text: lines 158-162.

      Author response image 3.

      Confocal images of haematopoietic production in haemogenic gastruloids. Wholemount, cleared haemogenic gastruloids were stained for CD45 (pseudo-coloured red) and C-Kit antigens (pseudo-coloured yellow) with indirect staining, as described in the manuscript. Flk1-GFP signal is shown in green. Nuclei are contrasted with DAPI. (A) 192h. (B) 216h.

      (3) Overall, I am not convinced that hematopoietic cells are consistently generated in these organoids. The authors should sort hematopoietic cells and perform May-Grunwald Giemsa stainings as they did in Figure 6 to confirm the nature of the blood cells generated.

      It is factual that the data are reproducible and complemented by functional assays shown in revised Fig. 2D-E, which clearly demonstrate haematopoietic output. The single-cell RNA-seq data also show expression of a haematopoietic programme, which we have complemented with biologically independent qRT-PCR analysis of the expression of key endothelial and haematopoietic marker and regulatory genes (revised Fig. 2F; in text: 200-209). As requested, we include Giemsa-Wright’s stained cytospins obtained at 216h to illustrate haematopoietic output. These are shown in revised Fig. 2S2A, in text: lines 194-199. Inevitably, the cytospins will be inconclusive as to the presence of endothelial-tohaematopoietic transition or the generation of haematopoietic stem/progenitor cells, as these cells do not have a distinctive morphology.

      (4) The scRNAseq in Figure 2 is very difficult to interpret. Specific points related to this: - Cluster annotation in Figure 2a is missing and should be included. 

      Why do the heatmaps show the expression of genes within sorted cells? Couldn't the authors show expression within clusters of hematopoietic cells as identified transcriptionally (which ones are they? See previous point)? Gene names are illegible.

      I see no expression of Hlf or Myb in CD45+ cells (Figure 2G). Hlf is not expressed by any of the populations examined (panels E, F, G). This suggests no MPP or pre-HSC are generated in the culture, contrary to what is stated in lines 242-245. (PMID 31076455 and 34589491).Later on, it is again stated that "hGx cells... lacked detection of HSC genes like Hlf, Gfi1, or Hoxa9" (lines 281-283). To me, this is proof of the absence of AGM-like hematopoiesis generated in those gastruloids.

      For a combination of logistic and technical reasons, we performed single-cell RNA-seq using the Smart-Seq2 platform, which is inherently low throughput. We overcame the issue of cell coverage by complementing whole-gastruloid transcriptional profiling at successive time-points with sorting of subpopulations of cells based on individual markers documented in Fig. 1. We clearly stated which platform was used as well as the number and type of cells profiled (Fig. 3S1 and lines 226-241 of the revised manuscript), and our approach is standard. Following suggestions of the Reviewers to further focus our analysis on the haemogenic cellular differentiation within the gastruloids, we revised the presentation of the scRNA-seq data to now provide UMAP projections with representation and quantification of individual genes, including the ones queried by the Reviewer in Fig. 3 and respective supplements. Specifically, re-clustering and highlighting of specific markers are shown in Figure 3A-D and presented in lines 267-303 of the revised manuscript. Complementary independent real-time quantitative (q)PCR analysis showing time-dependent expression of endothelial and haematopoietic markers is now in Figure 2F. In text: 200-208.

      (5) Mapping of scRNA-Seq data onto the dataset by Thambyrajah et al. is not proof of the generation of AGM HE. The dataset they are mapping to only contains AGM cells, therefore cells do not have the option to map onto something that is not AGM. The authors should try mapping to other publicly available datasets also including YS cells.

      We have done this and the data are presented in Figure 4A (Figure 4S1A) and Supplementary File. In text: 314-355. As detailed in response to Reviewer 1, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131) (revised Fig. 4A and 4 S1A), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346) (revised Fig. 4A and 4 S1B). Specifically in answering the Reviewers’ point, we show that different subsets of haemogenic gastruloid cells sorted on haemogenic surface markers C-Kit, CD41 and CD45 cluster onto pre-HE and HE, intra-aortic clusters and FL progenitor compartments, and to YS EMP and erythroid and myeloid progenitors. This lends support to our claim that the haemogenic gastruloid system specifies both YS-like and AGM-like cells. Please note that we now do point out that some CD41+ cells at 144h project onto IAC, as do cells at the later timepoints, suggesting that AGM-like and YS-EMP-like waves may overlap at the 144h timepoint (lines…). In the future, we will address specific location of these cells, but that corresponds to a largescale spatial transcriptomics analysis requiring extensive optimisation for section capture which is beyond the scope of this manuscript and this revision. 

      (6) Conclusions in Figure 3, named "hGx specify cells with preHSC characteristics" are not supported by the data presented here. Again, I am not convinced that hematopoietic cells can be efficiently generated in this system, and certainly not HSCs or pre-HSCs.

      We have provided evidence in the original manuscript, and now through additional experiments, that there is haematopoietic specification, including of progenitor cells, in the haemogenic gastruloid system. Molecular markers are shown in revised Fig. 2F and Fig. 3 and supplements; CFC assays are shown in revised Fig. 2D-E; cytospins are in revised Fig. 2 S2A; further analysis of 4-week implants and new analysis of 8-week implants (discussed below) are in revised Fig. 4 B-D and Fig. 4 S2 and we discussed the new scRNA-seq projections above. Importantly, we have never claimed, and again do not, that haemogenic gastruloids generate HSC. We accept the Reviewer’s comment that we have not provided sufficient evidence for the specification of pre-HSC-like cells and accordingly now refer more generically and conservatively to progenitors.

      FACS analysis in 3A is again very unconvincing. I do not think the population identified as C-Kit+ CD144+ is real. Also, why not try gating the other way around, as commonly done (e.g. VE-Cad+ Kit+ and then CD41/CD45)?

      Our gating strategy is not unconventional, which was done from a more populated gate onto the less abundant one to ensure that the results are numerically more robust. In the case of haemogenic gastruloids, unlike the AGM preparations the Reviewer may be referring to, CD41 and CD45+ cells are more abundant as there is no circulation of more differentiated haematopoietic cells away from the endothelial structures. This said, we did perform the gating as suggested (Rev Fig. 2), indeed confirming that most VE-cad+ Kit+ cells are CD45+. Interestingly VE-cad+Kit- are predominantly CD41+, reinforcing the haematopoietic nature of these cells.

      The authors must have tried really hard, but the lack of short- or long-engraftment in a number of immunodeficient mouse models (lines 305-313) really suggests that no blood progenitors are generated in their system. I am not familiar with the adrenal gland transplant system, but it seems like a very non-physiological system for trying to assess the maturation of putative pre-HSCs. The data supporting the engraftment of these mice, essentially seen only by PCR and in some cases with a very low threshold for detection, are very weak, and again unconvincing. It is stated that "BFP engraftment of the Spl and BM by flow cytometry was very low level albeit consistently above control (Fig. S4E)" (lines 337-338). I do not think that two dots in a dot plot can be presented as evidence of engraftment.

      We have presented the data with full disclosure and do not deny that the engraftment achieved is low-level and short-term, indicating incomplete maturation of definitive haematopoietic progenitors in the current haemogenic gastruloid system. Indeed, by not wanting to overstate the finding, we were deliberately conservative in our representative flow cytometry plots and focused on the PCR for sensitivity. We now present the full flow cytometry analysis for spleen where we preserved more cells after the genomic DNA extraction (revised Fig. 4C) and call the Reviewer’s attention to the fact that detection of BFP+ cells by PCR and flow cytometry in the recipient animals is consistent between the 2 methods (revised Fig. 4C and D; full gels previously presented now in Fig. 4S2C; sensitivity analysis was also previously available and is now in Fig. 4S2B). In addition, we have now also been able to detect low-level myelo-lymphoid engraftment in the bone marrow and spleen 8 weeks after adrenal implantation, again suggesting the presence of a small number of definitive haematopoietic progenitors that potentially mature from the 3 haemogenic gastruloids implanted (Fig. 4E and 4 S2F-G in the revised manuscript. We rephrased Results and Discussion at lines 359-414 and 589-621, respectively, to rectify the nature of the engraftment which we attribute to progenitors.

      (7) Given the above, I find that the foundations needed for extracting meaningful data from the system when perturbed are very shaky at best. Nevertheless, the authors proceed to overexpress MNX1 by LV transduction, a system previously shown to transform fetal liver cells, mimicking the effect of the t(7;12) AML-associated translocation. Comments on this section:

      The increase in the size of the organoid when MNX1 is expressed is a very unspecific finding and not necessarily an indication of any hematopoietic effect of MNX1 OE.

      We agree with the Reviewer on this point; it is nevertheless a reproducible observation which we thought relevant to describe for completeness and data reproducibility.

      The mild increase of cKit+ cells (Figure 4E) at the 144hr timepoint and the lack of any changes in CD41+ or CD45+ cells suggests that the increase in Kit+ cells % is not due to any hematopoietic effect of MNX1 OE. No hematopoietic GO categories are seen in RNA seq analysis, which supports this interpretation. Could it be that just endothelial cells are being generated?

      The Reviewer is correct that the MNX1-overexpressing cells have a strong endothelial signature, which is present in patients (revised Fig. 5A). We investigated a potential link with C-Kit by staining cells from the replating colonies during the process of in vitro transformation with CD31. We observed that 40-50% of C-Kit+ cells (20-30% total colony cells) co-expressed CD31, at least at early plating. These cells co-exist with haematopoietic cells, namely Ter119+ cells, as expected from the YSlike erythroid and EMP-like affiliation of haematopoietic output from 144h-haemogenic gastruloids. These data are included in Fig. 6S1A-B (in text 506-507) of the revised manuscript.

      (8) There seems to be a relatively convincing increase in replating potential upon MNX1-OE, but this experiment has been poorly characterized. What type of colonies are generated? What exactly is the "proportion of colony forming cells" in Figures 5B-D? The colony increase is accompanied by an increase in Kit+ cells; however, the flow cytometry analysis has not been quantified.

      Given the inability to replate control EV cells, there is not a population to compare with in terms of quantification. The level of C-Kit+ represented in Fig. 6E of the revised manuscript is achieved at plate 2 or 3 (depending on the experiment), both of which are significantly enriched for colony-forming cells relative to control (revised Fig. 6B, D).  

      (9) Do hGx cells engraft upon MNX1-OE? This experiment, which appears not to have been performed, is essential to conclude that leukemic transformation has occurred.

      For the purpose of this study, we are satisfied with confirmation of in vitro transformation potential of MNX1 haemogenic gastruloids, which can be used for screening purposes. Although interesting, in vivo leukaemia engraftment from haemogenic gastruloids is beyond the scope of this study.

      Reviewer #2 (Recommendations for the authors):

      (1) Minor comments

      (a) I find the denomination "hGx" very confusing as it would suggest that these gastruloids are human, whereas, in fact, they are murine.

      We agree with the Reviewer on the confusing nomenclature and have edited the manuscript to call “haemGx” instead.

      (b) I find the presence of mast cells in CFC of MNX1-OE cultures very puzzling as this does not bear any resemblance to human leukemia.

      We detect an enrichment of mast cell transcriptional programmes, as defined by the cell type repositories. While it is not mast cells to represent leukaemic cells in patients, this ontology is likely to reflect the developmental stage and origin of progenitors which are affected by MNX1.

      (2) I have a few suggestions to improve figures and tables clarity, to help readers better follow the data presented.

      (a) To enhance readability, it would be beneficial to highlight the genes mentioned in the text within the scRNA-seq figures. Many figures currently display over 30-40 genes in small font sizes, making it difficult to quickly locate specific genes discussed in the text. Additionally, implementing a colorcoding system to categorize these genes according to their proposed lineages would improve clarity and organization.

      We have now performed major re-organisation and re-analyses of the scRNA-seq data, which we believe has improved the readability and clarity of the corresponding sections of the manuscript.

      (b) The data presented in Supplementary Table 1, along with other supplementary tables, are challenging to interpret due to insufficient annotations. Enhancing these tables with clearer and more detailed annotations would significantly improve clarity and aid readers in understanding the supplementary materials.

      Descriptive text has been added to accompany each Supplementary File to aid in understanding the results reported therein.

      Reviewer #3 (Recommendations for the authors):

      In addition to what was written in the public review, I would suggest the authors simplify and shorten the text. Currently, a lot of unnecessary detail is included which makes the story very hard to follow. Moreover, the authors should modify the figures to make them more comprehensible, especially for RNA-seq data.

      We have significantly re-arranged and shortened parts of the manuscript, particularly by focusing the Discussion. Results presentation has also been improved through additional analysis and graphic representation of the scRNA-seq data, which we believe has improved the readability and clarity.s

    1. eLife Assessment

      This valuable study presents findings linking prophage carriage to lifestyle regulation in the marine bacterium Shewanella fidelis, with potential implications for niche occupation within a host (Ciona robusta) and mediation of host immune responses. The study leverages a unique animal model system that offers distinct advantages in identifying select phenotypes to present overall solid evidence that supports findings relating to the impact of a prophage on host-microbe interaction. Understanding the role of integrated lysogenic phages in bacterial fitness, both within a host and in the environment, is a significant concept in bacterial eco-physiology, potentially contributing to the success of certain strains.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript aims to elucidate the impact of a prophage within the genome of Shewanella fidelis on its interaction with the marine tunicate Ciona robusta. The authors made a deletion mutant of S. fidelis that lacks one of its two prophages. This mutant exhibited an enhanced biofilm phenotype, as assessed through crystal violet staining, and showed reduced motility. The authors examined the effect of prophage deletion on several genes that could modulate cyclic-diGMP levels. While no significant changes were observed under in vitro conditions, the gene for one protein potentially involved in cyclic-diGMP hydrolysis was overexpressed during microbe-host interactions. The mutant was retained more effectively within a one-hour timeframe, whereas the wild-type (WT) strain became more abundant after 24 hours. Fluorescence microscopy was used to visualize the localization patterns of the two strains, which appeared to differ. Additionally, a significant difference in the expression of one immune protein was noted after one hour, but this difference was not evident after 23 hours. An effect of VCBC-C addition on the expression of one prophage gene was also observed.

      Strengths:

      I appreciate how the authors integrate diverse expertise and methods to address questions regarding the impact of prophages on gut microbiome-host interactions. The chosen model system is appropriate, as it allows for high-throughput experimentation and the application of simple imaging techniques.

      Weaknesses:

      My primary concern is that the manuscript primarily describes observations without providing insight into the molecular mechanisms underlying the observed differences. It is particularly unclear how the presence of the prophage leads to the phenotypic changes related to bacterial physiology and host-microbe interactions. Which specific prophage genes are critical, or is the insertion at a specific site in the bacterial genome the key factor? While significant effects on bacterial physiology are reported under in vitro conditions, there is no clear attribution to particular enzymes or proteins. In contrast, when the system is expanded to include the tunicate, differences in the expression of a cyclic-diGMP hydrolase become apparent. Why do we not observe such differences under in vitro conditions, despite noting variations in biofilm formation and motility? Furthermore, given that the bacterial strain possesses two prophages, I am curious as to why the authors chose to target only one and not both.

      Regarding the microbe-host interaction, it is not clear why the increased retention ability of the prophage deletion strain did not lead to greater cell retention after 24 hours, especially since no differences in the immune response were observed at that time point.

      Concerning the methodological approach, I am puzzled as to why the authors opted for qPCR instead of transcriptomics or proteomics. The latter approaches could have provided a broader understanding of the prophage's impact on both the microbe and the host.

      Comments on revisions:

      While the authors were able to solve some of my issues, I see that other questions were not tackled.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, "Prophage regulation of Shewanella fidelis 3313 motility and biofilm formation: implications for gut colonization dynamics in Ciona robusta", the authors are experimentally investigating the idea that integrated viruses (prophages) within a bacterial colonizer of the host Ciona robusta affect both the colonizer and the host. They found a prophage within the Ciona robusta colonizing bacterium Shewanella fidelis 3313, which affected both the bacteria and host. This prophage does so by regulating the phosphodiesterase gene pdeB in the bacterium when the bacterium has colonized the host. The prophage also regulates the activity of the host immune gene VCBP-C during early bacterial colonization. Prophage effects on both these genes affect the precise localization of the colonizing bacterium, motility of the bacterium, and bacterial biofilm formation on the host. Interestingly, VCBP-C expression also suppressed a prophage structural protein, creating a tripartite feedback loop in this symbiosis. This is exciting research that adds to the emerging body of evidence that prophages can have beneficial effects not only on their host bacteria but also on how that bacteria interacts in its environment. This study establishes the evolutionary conservation of this concept with intriguing implications of prophage effects on tripartite interactions.

      Strengths:

      This research effectively shows that a prophage within a bacterium colonizing a model ascidian affects both the bacterium and the host in vivo. These data establish the prophage effects on bacterial activity and expand these effects to the natural interactions within the host animal. The effects of the prophage through deletion on a suite of host genes are a strength, as shown by striking microscopy.

      Weaknesses:

      Unfortunately, global transcriptomics of the bacteria and the host during colonization by the prophage-containing and prophage-deleted bacteria (1 hour and 24 hours) would be suggested to better understand the tripartite interactions.

      Impact:

      The authors are correct to speculate that this research can have a significant impact on many animal microbiome studies, since bacterial lysogens are prevalent in most microbiomes. Screening for prophages, determining whether they are active, and "curing" the host bacteria of active prophages are effective tools for understanding the effects these mobile elements have on microbiomes. There are many potential effects of these elements in vivo, both positive and negative, this research is a good example of why this research should be explored.

      Context:

      The research area of prophage effects on host bacteria in vitro has been studied for decades, while these interactions in combination with animal hosts in vivo have been recent. The significance of this research shows that there could be divergent effects based on whether the study is conducted in vitro or in vivo. The in vivo results were striking. This is particularly so with the microscopy images. The benefit of using Ciona is that it has a translucent body which allows for following microbial localization. This is in contrast to mammalian studies where following microbial localization would either be difficult or near impossible.

      Comments on revisions:

      I am satisfied with the great amount of work that went into the comments provided by the reviewers. The figure presentations are more compelling for the story, and this latest revision is a very interesting read that should be considered for future microbiome studies.

    4. Reviewer #3 (Public review):

      In this manuscript, Natarajan and colleagues report on the role of a prophage, termed SfPat, in the regulation of motility and biofilm formation by the marine bacterium Shewanella fidelis. The authors investigate the in vivo relevance of prophage carriage by studying the gut occupation patterns of Shewanella fidelis wild-type and an isogenic SfPat- mutant derivative in a model organism, juveniles of the marine tunicate Ciona robusta. The role of bacterial prophages in regulating bacterial lifestyle adaptation and niche occupation is a relatively underexplored field, and efforts in this direction are appreciated.

      Comments on revisions:

      The authors have addressed my main concerns. While some responses remain somewhat ambiguous or defer key clarifications to future studies, I appreciate that not everything can be resolved within a single manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript aims to elucidate the impact of a prophage within the genome of Shewanella fidelis on its interaction with the marine tunicate Ciona robusta. The authors made a deletion mutant of S. fidelis that lacks one of its two prophages. This mutant exhibited an enhanced biofilm phenotype, as assessed through crystal violet staining, and showed reduced motility. The authors examined the effect of prophage deletion on several genes that could modulate cyclic-diGMP levels. While no significant changes were observed under in vitro conditions, the gene for one protein potentially involved in cyclic-diGMP hydrolysis was overexpressed during microbe-host interactions. The mutant was retained more effectively within a one-hour timeframe, whereas the wild-type (WT) strain became more abundant after 24 hours. Fluorescence microscopy was used to visualize the localization patterns of the two strains, which appeared to differ. Additionally, a significant difference in the expression of one immune protein was noted after one hour, but this difference was not evident after 23 hours. An effect of VCBC-C addition on the expression of one prophage gene was also observed.

      Strengths:

      I appreciate how the authors integrate diverse expertise and methods to address questions regarding the impact of prophages on gut microbiome-host interactions. The chosen model system is appropriate, as it allows for high-throughput experimentation and the application of simple imaging techniques.

      Weaknesses:

      My primary concern is that the manuscript primarily describes observations without providing insight into the molecular mechanisms underlying the observed differences. It is particularly unclear how the presence of the prophage leads to the phenotypic changes related to bacterial physiology and host-microbe interactions.

      We appreciate the overall, enthusiastic reviewer feedback.  The current manuscript presents experimental evidence of the biological impact of the deletion of a stably integrated prophage in the genome of Shewanella fidelis 3313. The molecular mechanisms responsible for these biological effects are currently unknown but based on the limited genetic insight of some predicted gene regions, we can speculate on prophage-mediated influences impacting swimming behaviors. Below, we address additional concerns raised by the reviewer.

      Which specific prophage genes are critical, or is the insertion at a specific site in the bacterial genome the key factor?  While significant effects on bacterial physiology are reported under in vitro conditions, there is no clear attribution to particular enzymes or proteins.

      In this particular case, it is not entirely clear, as most ORFs within the prophage region have unknown functions, i.e., predicted as hypothetical proteins. In addition, the original insertion site does not appear to interrupt any specific gene but may impact adjacent genes/pathways (Fig 1b). Enhanced annotations, along with future targeted deletion methods for distinct prophage segments, will help us better investigate which predicted gene regions influence the observed traits. This will deepen our understanding of the mechanisms that regulate prophage influence on these traits.

      In contrast, when the system is expanded to include the tunicate, differences in the expression of a cyclic-diGMP hydrolase become apparent. Why do we not observe such differences under in vitro conditions, despite noting variations in biofilm formation and motility? Furthermore, given that the bacterial strain possesses two prophages, I am curious as to why the authors chose to target only one and not both.

      Differences in expression patterns of c-di-GMP regulators were also noted in vitro, but they just missed the statistical significance threshold when rho was used as a bacterial reference gene. The expression pattern of pdeB was consistent among each biological replicate, however. In full transparency, pdeB qPCR was originally performed with recA as a reference standard (bioRxiv preprint, ver 1). Here, significant changes in pdeB expression were observed in the in vitro assays comparing WT and ΔSfPat. These results prompted us to study changes in pdeB expression during in vivo colonization experiments, which also revealed significant changes. However, there was a concern that a potential SOS response would also activate recA, despite our preliminary data suggesting SOS was not involved. As a precautionary, we repeated the experiments with rho as a reference gene after it was identified as a stable reference. However, with rho as a reference gene, statistically significant responses were noted during in vivo colonization, but not in the in vitro assays. 

      In the current manuscript, one prophage was targeted based on preliminary findings indicating that the SfPat prophage region influences behaviors likely to impact colonization of the Ciona robusta gut. A separate genetic segment was also previously targeted for deletion as a misidentified prophage-like region, but that strain is not included in the current description. The currently presented data indicate that the observed phenomena can be attributed to the SfPat prophage.

      Regarding the microbe-host interaction, it is not clear why the increased retention ability of the prophage deletion strain did not lead to greater cell retention after 24 hours, especially since no differences in the immune response were observed at that time point.

      A predominantly adherent (non-motile) phenotype would likely facilitate elimination within fecal strings. There is substantial evidence from multiple model systems that strong swimming ability enhances the exploration and colonization of mucosal surfaces. Swimming helps with the penetration of mucus layers, chemotaxis toward epithelial surfaces, and overall “decision-making” in terms of shifting from a free-swimming (planktonic) state in the lumen within dietary material to a more sessile, adherent phenotype at the mucosal surface.

      Concerning the methodological approach, I am puzzled as to why the authors opted for qPCR instead of transcriptomics or proteomics. The latter approaches could have provided a broader understanding of the prophage's impact on both the microbe and the host.

      We agree with the reviewer that a transcriptomics approach would provide a broader understanding of the prophage’s impact on the microbe and animal host. Future studies will include a full multi-omic evaluation of this interaction. 

      Reviewer #1 (Recommendations for the authors):

      Besides my above mentioned issues, I have a few more mini things:

      (A) what makes S. fidelis being a persistant member of the host microbiome? Please elaborate more on quantitive studies in this respect. –

      Shewanella species are stable members of the Ciona gut, and previous efforts (Dishaw et al, 2016) revealed that chitin and/or secreted host effectors could influence biofilm formation. The Ciona gut produces copious amounts of endogenous chitin-rich mucus, and a variety of bacteria have been identified that thrive under these conditions. In addition, versatile bacteria like Shewanella sp. likely expand the metabolic potential of filter-feeders like Ciona. Thus, our subsequent studies began to focus on these and other microbes isolated from the Ciona gut that appear to be stable residents. Identical strains have been recovered numerous times (since 2011) from this wild population of Ciona robusta.  

      (B) The authors use the word inter kingdom and refer to phage, bacterium and animal. As phages are not part of the three kingdoms of life I believe the terminology is wrong.

      Thank you for bringing this to our attention. In this context, we were referring to bacteria+phage as a unit and their interkingdom interaction with the animal host. But we recognize that this term can be misleading. Another, more appropriate term is ‘tripartite,’ and we have changed interkingdom to tripartite as appropriate, e.g., the abstract.

      (C) I like lines 55-61 and was expecting to see in the manuscript what of those things would be true for the chosen prophage.

      We looked at the coding region annotations within the prophage and the adjacent regions. The prophage coding regions are mostly annotated as unknown or predicted proteins, and a few as known phage-related components. We intend to reanalyze future and improved annotations and conduct deletion experiments targeting specific open reading frames (ORFs).

      (D) In line 76 the authors mention a Gödecke reference for Pseudomonas. I believe that this paper only deals with S. oneidensis.

      The inadvertent Gödecke reference has been removed.

      (E) All figures: The captions are too short to understand what the figures are showing and everything is too small and hard to read or see. Along these lines it is often unclear what the many datapoints show. Biological replicates, technical replicates....Overall figure 1 does not seem to contain much information.

      Figures and captions have been improved as suggested. Thank you for bringing this to our attention.

      (F) Figure 3 what are a and b showing?

      Figure and descriptive legend have been improved.

      (G) Figure 4: Why did the author check expression only for one gene after 1 h but several genes after 24 h?

      Since we observed that in vitro VCBP-C alters biofilms of S. fidelis 3313 (Dishaw et al 2016), we hypothesized that the bacteria may alter host VCBP-C expression and that the influence of integrated prophages may further modulate gene expression. Since VCBP-C is endogenously expressed in the gut of Ciona, we expected that early exposure/colonization (one hour) would be crucial for the bacterial-VCBP interactions. Hence, the VCBP-C was our primary target. We then tested multiple immune response genes at 24 hours to get a more detailed understanding of the maturing immune responses. Future studies will expand our efforts using global transcriptomics to understand better the immune response during bacterial exposure and colonization events.

      (H) Do the authors mean stationary or localised?

      We are not sure about the context of the reviewer’s question here but we think our modifications have addressed these concerns. 

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, "Prophage regulation of Shewanella fidelis 3313 motility and biofilm formation: implications for gut colonization dynamics in Ciona robusta", the authors are experimentally investigating the idea that integrated viruses (prophages) within a bacterial colonizer of the host Ciona robusta affect both the colonizer and the host. They found a prophage within the Ciona robusta colonizing bacterium Shewanella fidelis 3313, which affected both the bacteria and host. This prophage does so by regulating the phosphodiesterase gene pdeB in the bacterium when the bacterium has colonized the host. The prophage also regulates the activity of the host immune gene VCBP-C during early bacterial colonization. Prophage effects on both these genes affect the precise localization of the colonizing bacterium, motility of the bacterium, and bacterial biofilm formation on the host. Interestingly, VCBP-C expression also suppressed a prophage structural protein, creating a tripartite feedback loop in this symbiosis. This is exciting research that adds to the emerging body of evidence that prophages can have beneficial effects not only on their host bacteria but also on how that bacteria interacts in its environment. This study establishes the evolutionary conservation of this concept with intriguing implications of prophage effects on tripartite interactions.

      Strengths:

      This research effectively shows that a prophage within a bacterium colonizing a model ascidian affects both the bacterium and the host in vivo. These data establish the prophage effects on bacterial activity and expand these effects to the natural interactions within the host animal. The effects of the prophage through deletion on a suite of host genes are a strength, as shown by striking microscopy.

      Weaknesses:

      Unfortunately, there are abundant negative data that cast some limitations on the interpretation of the data. That is, examining specific gene expression has its limitations, which could be avoided by global transcriptomics of the bacteria and the host during colonization by the prophage-containing and prophage-deleted bacteria (1 hour and 24 hours). In this way, the tripartite interactions leading to mechanism could be better established.

      We thank the reviewer for their comments and recognize this important limitation. As a follow-up to the current study, we plan to perform more comprehensive global meta-transcriptomics analyses to better understand differentially expressed genes across both the host and microbe during colonization.

      Impact:

      The authors are correct to speculate that this research can have a significant impact on many animal microbiome studies, since bacterial lysogens are prevalent in most microbiomes. Screening for prophages, determining whether they are active, and "curing" the host bacteria of active prophages are effective tools for understanding the effects these mobile elements have on microbiomes. There are many potential effects of these elements in vivo, both positive and negative, this research is a good example of why this research should be explored.

      Context:

      The research area of prophage effects on host bacteria in vitro has been studied for decades, while these interactions in combination with animal hosts in vivo have been recent. The significance of this research shows that there could be divergent effects based on whether the study is conducted in vitro or in vivo. The in vivo results were striking. This is particularly so with the microscopy images. The benefit of using Ciona is that it has a translucent body which allows for following microbial localization. This is in contrast to mammalian studies where following microbial localization would either be difficult or near impossible.

      Reviewer #2 (Recommendations for the authors):

      In general, I found that the research shown in this manuscript is solid, and the manuscript is well-written. I have no specific comments about the writing of the manuscript that would be of benefit.

      Figure 1 would benefit from the shrinking of white space between panels a and b. Also, in panel b, it is very difficult to read the x-axis, the number of basepairs. It is suggested to increase this font size.

      Figure 1 has been improved as suggested.

      Figure 2 is fine, however, what do three asterisks (***) in panel a signify? It is not described in the legend. One minor point that affects data understanding as presented, the wildtype (WT) change in expression is normalized to itself, therefore always equaling 1.0. This method of presentation muddies the variation in gene expression in the presence of the prophage. This is not an issue in Figure 2, but does have an effect on understanding Figure 2 - figure supplement 1.

      Figure 2 - figure supplement 1, as stated above, the normalization of the WT change in gene expression to 1.0 makes it difficult to understand the results. Why is pilZ change in gene expression not significant in panel s1a? It seems the median change is 50%, or whatever averaging is done, it's unclear whether this is the median and whether the error bars are standard deviation or some other metric.

      These should be defined in the statistical analysis section of the methods or in the legend itself. Further, in panel s1b, why is the reduction in gene expression of pdeB statistically significant, while a similar reduction in gene expression of pleD is not statistically significant?

      RQ values were calculated from 2<sup>-ddCt</sup>. The error bars in the figures were calculated by adding or subtracting the standard error from RQ. Since WT was used as a reference value for qPCR, the RQ value was normalized as 1 for all replicates and nonparametric tests were used to calculate the statistical significance. The values for pilZ were very close to significant; a value of 0.063 was derived via the Wilcoxon test. Only the changes in expression of pdeB were determined to be statistically significant, via the Wilcoxon test.

      Figure 3 panels a and b would be helped by having the same y-axis for each. It is impressive the amount of WT bacterial colonization takes place in 24 hours, particularly in the absence of the prophage, but it does not appear as impressive when the axes are changed between panels. Similar axes should be considered for every comparative graph.

      Figure 3 - figure supplement 1 legend would benefit from the same description of the animal's digestive locations as in the legend in Figure 3.

      We appreciate these suggestions and have made these changes accordingly. We have remade and combined Figure 3 a and b

      Figure 4, while it is unfortunate that none of the immune genes evaluated had a response to the deletion of the SfPat prophage in S. fidelis 3313 at 24 hours, did any of these genes have an effect at 1 hour of evaluation as VCBP-C did?

      The expression of this expanded gene set was not evaluated at one hour. This time point will, however, be included in our global evaluation of gene expression in our future transcriptome sequencing effort.

      Figure 5, the only question I have with these data is whether or not there is a dose-dependent effect of VCBP-C on SfPat P5 expression?

      Prior studies have found VCBP-C can impact biofilm formation in Shewanella sp. in a dose-dependent manner (some of the data appears in Dishaw et al, 2016). However, we have not yet considered whether VCBP-C impacts the expression of SfPat P5 (a phage capsid component) in a dose-dependent manner. We will consider this in future experimental designs.

      It is mentioned in the introduction (and data shown in the preprint) that there is more than one active prophage in Shewanella fidelis 3313. The preprint data shows that the Mu prophages had little effect on the studies. It may be worth discussing the presence and lack of effects of these Mu prophages. It also may lead to some discussion about the complexities of polylysogeny (as discussed by Silpe, et al, Nature, 2023).

      A full-length, inducible, Mu-like prophage region has been identified in the genome that has not been targeted for deletion, but will be included in follow-up studies. An earlier incomplete genome assembly contributed to the incorrect targeting and deletion of a prior Mu-like region, which was discussed in an earlier preprint version. Discussion and references to that strain have been removed from the more recent preprint versions. For clarity, the current manuscript describes strains that remain focused on the SfPat prophage, noting its contribution to the observed behavioral changes / traits.

      Is there any spontaneous induction of SfPat in vitro or in vivo with temperature change (prophages have been induced with heat stress), excessive UV exposure, or mitomycin C treatment?

      Preliminary induction studies using UV, mitomycin C, and temperature have been completed, but remain inconclusive with SfPat due to inconsistent induction patterns.

      Could you speculate, or perhaps do the experiment, as to whether the addition of VCBP-C to S. fidelis 3313 cultures affects biofilm production? The deletion of SfPat leads to greater biofilm production in vitro, while exogenously added VCBP-C represses SfPat P5 expression, would VCPB-C addition lead to greater biofilm production? Lastly, and this may be a failure of my understanding, is VCBP-C able to bind to S. fidelis? If so, does the prophage alter the bacteria and, consequently, the ability of VCBP-C to bind to the bacteria?

      Our lab is actively working to better understand the physical interactions of VCBP-C and bacteria, particularly lysogenic bacteria. Deletion mutants are helping us better understand the potential influence of the bacterial accessory genome on interactions with host immune mediators. Biofilm assays have been done in the context of VCBP-C (Dishaw et al, 2016). Subsequently, we tested the influence of 50 µg/ml VCBP-C on WT and prophage KO-strains, which include SfPat KO along with neutral (control) regions of the genome. We found that the presence of VCBP-C reduced biofilm formation in WT and phage KO variants at 4 hrs and 24 hrs. However, at 12 hrs, VCBP-C treatment appears to increase biofilm formation in the phage-KO strain. While the role (if any) of SfMu is remains unclear, these preliminary data imply the existence of a feedback circuit (influenced by time) where immune effector binding and prophage influence on host gene expression together shape retention outcomes in the gut microbiome. This hypothesis remains to be tested further.

      Author response image 1.

      WT S. fidelis 3313 was exposed in vitro to 50 µg/ml VCBP-C in stationary cultures. Biofilms were observed for 24hrs.  At 12 hrs, the presence of VCBP-C increased the amount of biofilms, whereas reduced biofilms were observed at 4 and 24hrs. Our findings (manuscript Fig 2a) reveal that SfPat contributes to biofilm formation, exposure to SfPat deletion mutants increases host VCBP-C expression (manuscript Fig. 4a), and VCBP-C binding to WT S. fidelis 3313 reduces the expression of SfPat P5 capsid protein (manuscript Fig. 5). These findings suggest that in vivo exposure/ colonization assays benefit from detailed time-course observations to be further explored in follow-up, future experiments.

      Reviewer #3 (Public review):

      In this manuscript, Natarajan and colleagues report on the role of a prophage, termed SfPat, in the regulation of motility and biofilm formation by the marine bacterium Shewanella fidelis. The authors investigate the in vivo relevance of prophage carriage by studying the gut occupation patterns of Shewanella fidelis wild-type and an isogenic SfPat- mutant derivative in a model organism, juveniles of the marine tunicate Ciona robusta. The role of bacterial prophages in regulating bacterial lifestyle adaptation and niche occupation is a relatively underexplored field, and efforts in this direction are appreciated.

      While the research question is interesting, the work presented lacks clarity in its support for several major claims, and, at times, the authors do not adequately explain their data.

      Major concerns:

      (1) Prophage deletion renders the SfPat- mutant derivative substantially less motile and with a higher biofilm formation capacity than the WT (Fig. 2a-b). The authors claim the mutant is otherwise isogenic to the WT strain upon sequence comparison of draft genome sequences (I'll take the opportunity to comment here that GenBank accessions are preferable to BioSample accessions in Table 1). Even in the absence of secondary mutations, complementation is needed to validate functional associations (i.e., phenotype restoration). A strategy for this could be phage reintegration into the mutant strain (PMID: 19005496).

      We are currently investigating complementation strategies. However, there have been some challenges in re-infecting and/or reintegrating the prophage into the genome. A preferred integration site may be damaged due to the deletion approach. While the SfPat prophage has mostly predicted genes of unknown function or significance, we have begun prioritizing the deletion of distinct segments to help identify functional relevance.

      (2) The authors claim that the downshift in motility (concomitant with an upshift in biofilm formation) is likely mediated by the activity of c-di-GMP turnover proteins. Specifically, the authors point to the c-di-GMP-specific phosphodiesterase PdeB as a key mediator, after finding lower transcript levels for its coding gene in vivo (lines 148-151, Fig. 2c), and suggesting higher activity of this protein in live animals (!)(line 229). I have several concerns here:

      (2.1) Findings shown in Fig. 2a-b are in vitro, yet no altered transcript levels for pdeB were recorded (Fig. 2c). Why do the authors base their inferences only on in vivo data?

      (2.2) Somewhat altered transcript levels alone are insufficient for making associations, let alone solid statements. Often, the activity of c-di-GMP turnover proteins is local and/or depends on the activation of specific sensory modules - in the case of PdeB, a PAS domain and a periplasmic sensor domain (PMID: 35501424). This has not been explored in the manuscript, i.e., specific activation vs. global alterations of cellular c-di-GMP pools (or involvement of other proteins, please see below). Additional experiments are needed to confirm the involvement of PdeB. Gaining such mechanistic insights would greatly enhance the impact of this study.

      (2.3) What is the rationale behind selecting only four genes to probe the influence of the prophage on Ciona gut colonization by determining their transcript levels in vitro and in vivo? If the authors attribute the distinct behavior of the mutant to altered c-di-GMP homeostasis, as may be plausible, why did the authors choose those four genes specifically and not, for example, the many other c-di-GMP turnover protein-coding genes or c-di-GMP effectors present in the S. fidelis genome? This methodological approach seems inadequate to me, and the conclusions on the potential implication of PdeB are premature.

      We chose to study genes that were shown previously to influence biofilms and motility in a cyclic-di-GMP dependent manner in a Shewanella spp (Chao et al 2013, S Rakshe 2011). Future transcriptomic efforts and targeted deletion approaches will further define the specific influence of prophages.

      (3) The behavior of the WT strain and the prophage deletion mutant is insufficiently characterized. For instance, how do the authors know that the higher retention capacity reported for the WT strain with respect to the mutant (Fig. 3b) is not merely a consequence of, e.g., a higher growth rate? It would be worth investigating this further, ideally under conditions reflecting the host environment.

      To clarify the method, in vitro growth curves did not suggest any significant difference in growth rate between the WT and the deletion mutant strains. Subsequently, for the in vivo experiments, bacterial cultures were pelleted and resuspended in sterile, nutrient-free artificial seawater. This limits growth until the bacterial strains are introduced to the animals.

      (4) Related to the above, sometimes the authors refer to "retention" (e.g., line 162) and at other instances to "colonization" (e.g., line 161), or even adhesion (line 225). These are distinct processes. The authors have only tracked the presence of bacteria by fluorescence labeling; adhesion or colonization has not been assessed or demonstrated in vivo. Please revise.

      We thank the reviewer for this feedback; the manuscript has been revised accordingly. While we refer to our assays as ‘colonization assays,’ we report results of ‘retention’ of various bacterial strains in the ‘exposed’ animals. Furthermore, when fluorescent staining is utilized, we report retention in defined niches. Since colonization is likely a two-step process, i.e., 1) retention and 2) colonization or long-term establishment of these microbial communities, using these terms correctly is warranted. In separate (unpublished) surveys of adult animals taken from the field, identical strains have been recovered numerous times over a twelve-year period.

      (5) The higher CFU numbers for the WT after 24 h (line 161) might also indicate a role of motility for successful niche occupation or dissemination in vivo. The authors could test this hypothesis by examining the behavior of, e.g., flagellar mutants in their in vivo model.

      Interestingly, we find numerous flagellar/motility-associated protein coding genes like Flg, Fli and Fle present within the S. fidelis genome possessing an EAL domain, implicating them in the regulation of cyclic-di-GMP. Hence, a future global transcriptomic approach will help improve our understanding of the roles of these regulatory pathways.

      (6) The endpoint of experiments with a mixed WT-mutant inoculum (assumedly 1:1? Please specify) was set to 1 h, I assume because of the differences observed in CFU counts after 24 h. In vivo findings shown in Fig. 3c-e are, prima facie, somewhat contradictory. The authors report preferential occupation of the esophagus by the WT (line 223), which seems proficient from evidence shown in Fig. S3. Yet, there is marginal presence of the WT in the esophagus in experiments with a mixed inoculum (Fig. 3d) or none at all (Fig. 3e). Likewise, the authors claim preferential "adhesion to stomach folds" by the mutant strain (line 225), but this is not evident from Fig. 3e. In fact, the occupation patterns by the WT and mutant strain in the stomach in panel 3e appear to differ from what is shown in panel 3d. The same holds true for the claimed "preferential localization of the WT in the pyloric cecum," with Fig. 3d showing a yellow signal that indicates the coexistence of WT and mutant.

      The results section is reworded to improve clarity. The WT and KO are mixed 1:1 to achieve the 10<sup>7</sup> cfu count.

      (7) In general, and especially for in vivo data, there is considerable variability that precludes drawing conclusions beyond mere trends. One could attribute such variability in vivo to the employed model organism (which is not germ-free), differences between individuals, and other factors. This should be discussed more openly in the main text and presented as a limitation of the study.

      Yes, a salient feature of this model is that we can leverage genetic diversity in our experimental design, but it can introduce experimental variability.

      Even with such intrinsic factors affecting in vivo measurements, certain in vitro experiments, which are expected, in principle, to yield more reproducible results, also show high variability (e.g., Fig. 5). What do the authors attribute this variability to?

      For experiments involving VCBP-C protein, we can use affinity-purified protein recovered from live animals, or recombinant protein that we synthesize in-house (Dishaw et al 2011, 2016). In the latter, we often observe slight lot-to-lot variation in affinity for the target (the bacterial surface). To account for this variation and to ensure the observations are robust despite it, production lots can be mixed in additional biological replicates. As such, slight variability in the in vitro assays can be due to this batch effect.

      (8) Line 198-199: Why not look for potential prophage excision directly rather than relying on indirect, presumptive evidence based on qPCR?

      The decision to rely on qPCR of prophage structural genes was based on preliminary data, in particular among lysogens possessing more than one prophage. Neither the plaque assay nor SYBR Gold staining could distinguish among the particles, and TEM imaging was not sufficiently qualitative. Since these prophages do not exclusively produce particles when induced, qPCR targeting structural proteins was found to be most informative.

      Reviewer #3 (Recommendations for the authors):

      Other major comments:

      Line 137 (and Fig. 2 legend): The authors did not test chemotaxis towards any specific chemoeffector, only motility. Please correct and see below my comments about motility assays.

      The reviewer is correct; we have modified our descriptors.

      Lines 142-144: The authors conflate quorum sensing with c-di-GMP metabolism. If the authors measured the expression of genes "regulating cyclic di-GMP," it is likely because c-di-GMP is known to regulate the switch between planktonic and sessile lifestyles. However, whether this is mediated by quorum sensing is a separate issue that was not explored in this work. Please revise.

      Thank you; these changes were made accordingly.

      Line 150: c-di-GMP is not a quorum sensing signal; please correct.

      Yes, we corrected the inadvertent yet misleading statement.

      Line 193: Please clarify "RNA was extracted from the biofilms." If S. fidelis was grown on "MA [Marine Agar] for 24 h in the presence or absence of 50 µg/ml VCBP-C" (lines 192-193), was RNA isolated from colonies growing on the plates? Was VCBP-C added to the agar? This is also unclear in the Methods section (lines 381-384), where it seems the authors conducted this experiment using broth cultures in multiwell plates, removing the supernatant, and extracting RNA from the biofilms (i.e., cells adhered to the walls and bottom of the wells?). Why only biofilm cells?

      Thank you for bringing this to our attention. We have rewritten the appropriate sections and methods to improve clarity. Following our initial studies, which revealed differential bacterial phenotypes (biofilm formation and motility assays), we decided to target and investigate gene expression in the biofilms. This way, the sessile cells that were not part of the biofilm do not obfuscate the data.

      Lines 204-205: The authors should refer to the behavior of the mutant, since they did not test what happens upon prophage integration, but after prophage deletion.

      The wording has been changed accordingly.

      Lines 206-207: Please explain why the authors state that "these different bacterial phenotypes" (referring to altered biofilm formation and motility) "influence host immune responses in a manner consistent with influences on gut colonization dynamics". What specific relationship are the authors suggesting between these processes, and in what way is this "consistent"?

      We previously demonstrated (Dishaw et al 2016) that copious amounts of VCBP-C protein are present under normal conditions in the gut and mostly found tethered to chitin-rich mucus lining the gut epithelium. The up-regulation of VCBP-C within one hour of exposure to the SfPat mutant relative to the WT S. fidelis is consistent with a role for VCBP-C in modulating bacterial settlement dynamics (Dishaw et al 2016). The mutant phenotype of reduced swimming and increased biofilm production is a likely trigger for the increased production of this secreted immune effector that may influence the retention of this bacterial variant, relative to the WT.

      Line 229: Apart from what I noted above about the authors' claim regarding PdeB activity, I believe the figure referred to here should be Fig. 2, not Fig. 5.

      Thank you for catching that oversight. It has been corrected.

      Figure 1: Was hypothetical protein 2 included in the deletion?

      Yes, the hypothetical protein 2 was included in the deletion

      Figure 3a-b: It is challenging to interpret data on plots using so many colors - including what appears to be a white circle (?) in Fig. 3a. How many replicates are represented here? Is it indeed n=3 in Fig. 3a and n=6 in Fig. 3b?  

      Figure 3a is a bee swarm plot. Each color represents biological replicates, and the smaller circles represent technical replicates. It facilitates showing ALL the data, including the spread of the data. Regarding the number replicates, 3a and 3b are different experiments, with 3a representing a biofilm assay with three biological replicates and 3b a motility assay with six biological replicates.

      Figure 3: An explanation for the abbreviation "FP" is missing.

      Thank you for catching this oversight. The abbreviation has been defined.

      Figure S3: FP, which is proficiently occupied by the WT strain (Fig. S3a), is not labeled in the images provided for the mutant (Fig. S3c-d). It would be helpful to show it for comparison.

      Those other images did not have fecal pellets to label; however, Figure 3c does show a fecal pellet for an animal exposed to both WT and the SfPat mutant.

      Questions and comments regarding methods:

      Lines 290-291, 307: Please indicate an approximate range for "room temperature."

      The information has been added to the revised manuscript.

      Lines 292, 302: Why use hybrid LB/MB broth and agar? And strictly speaking, which LB formula (Lennox/Luria/Miller)?

      The hybrid broth reduces the concentration of salts that can interfere in some assays. The LB formula was Luria, and it is now included in the manuscript.

      Lines 300-302: The conjugation procedure is poorly described. It seems the authors conducted conjugal transfer by biparental mating in broth culture by inoculating a single colony of S. fidelis 3313 into an already grown culture of the E. coli donor strain?

      The biparental mating was done on plates; the manuscript has been clarified.

      Motility assay concerns:

      Swimming motility is generally assayed in soft agar (0.25-0.3% w/v). Why did the authors use 0.5% low-melt agarose? Usually, agar is employed instead of agarose, and such a high concentration of solidifying agent typically prevents proper swimming (see e.g. Kearns 2010).

      Our laboratory uses low-melt agarose for phage propagation and other assays. We continued using it because we observed robust and reproducible results in the swarming and swimming motility assays. In addition, 0.5% agarose is less dense than 0.5% agar, and its consistency is similar to that of the lower percentage soft agar.

      Lines 316-317: Please clarify: what is the "overlay motility assay" that was carried out "overnight at RT and then inoculated onto the center of soft agar"? Was this a two-step experiment? How were bacteria inoculated (stabbed, injected)? If injected, what volume and cell density were used?

      Thank you for bringing this to our attention. The methods section has been revised for clarity.

      Line 319: Each variable tested in duplicate? From what I understand, the only variable measured in this test is the diameter of the swimming halos. Do the authors mean they used two biological replicates? If so, please indicate the number of technical replicates as well.

      Multiple biological replicates were performed, each time with two technical replicates. Two perpendicular measurements (of diameter) for each technical replicate was recorded to avoid bias. The methods section has been edited to improve clarity.

      Line 320: Were the swimming halos asymmetrical, hence the need to take two perpendicular measurements? If that was the case, it could indicate an excessive amount of solidifying agent.

      The halos were sometimes asymmetric, but to avoid variation across datasets, it became standard practice to measure perpendicular distances as stated above. 

      Regarding qPCR experiments:

      Please clarify how normalization of transcript levels was performed.

      It seems the authors conducted a double normalization, first with respect to the calibrator (rho), and again using the wild-type as a baseline reference for fold-change calculations (absence of error bars for WT data). If so, please specify on the vertical axes of the figures and in the Methods/figure legends.

      Since, in addition to rho, the authors assessed the expression stability of the "housekeeping" genes gyrB and recA, please also include the primers used for these genes.

      The appropriate manuscript sections have been updated for clarity. The bacterial qPCR was normalized to an internal standard, and then relative expression differences between SfPat and the WT were determined. The missing primer sequences have also been added.

      Observations:

      Figure 2a-b: It is intriguing that the remarkable reduction in motility of the mutant is not associated with a comparably significant increase in biofilm formation.

      A statistically significant increase in biofilm was observed, along with a decrease in motility. As is common in crystal violet assays, some of the tertiary structures were not very stable and likely washed out during processing.

      Additionally, it is noteworthy that data for the mutant in panel 2a exhibit minimal variability, with all OD570 recordings being around 3.0. Did the authors dilute the crystal violet elution solution after adding acetic acid, or might they have reached the saturation limit of the spectrophotometer?

      The eluted acetic acid was not diluted further, and significant changes were observed. If the solution had been further diluted, the observed changes might have been more pronounced. 

      Minor comments and recommendations:

      All the suggested changes below have been incorporated

      • Line 55: "Antibiotic resistance determinants" might be preferable to "genes" to avoid using "genes" twice in the same sentence.

      • Line 75-76: Italicize Pseudomonas aeruginosa.

      • Line 134: Instead of "at least," specify the average fold-change.

      • Line 141: In the heading, refer to the influence of the "prophage" (singular) rather than "prophages" (plural).

      • Discussion (style): Consider using past tense for phrases like "we utilize..." (line 202); "we find..." (line 204), etc.

      • Line 365 and elsewhere: Consider "mRNA levels" or "transcript levels" instead of "gene expression".

      • Table 3: UQ950 is a strain, not a plasmid. I assume the plasmid carried by UQ950 is pSMV3.

    1. eLife Assessment

      This work provides valuable insights by introducing a post-translational extrusion mechanism that could reshape how we understand the coupling between DnaA activity and DNA-replication initiation. While solid evidence is presented for some of the key results, other claims rest on indirect proxies and could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Li and coworkers addresses the important and fundamental question of replication initiation in Escherichia coli, which remains open, despite many classic and recent works. It leverages single-cell mRNA-FISH experiments in strains with titratable DnaA and novel DnaA activity reporters to monitor DNA activity peaks versus size. The authors find oscillations in DnaA activity and show that their peaks correlate well with the estimated population-average replication initiation volume across conditions and imposed dnaA transcription levels. The study also proposes a novel extrusion model where DNA-binding proteins regulate free DnaA availability in response to biomass-DNA imbalance. Experimental perturbations of H-NS support the model validity, addressing key gaps in current replication control frameworks.

      Strengths:

      I find the study interesting and well conducted, and I think its main strong points are:

      (1) the novel reporters obtained with systematic synthetic biology methods, and combined with a titratable dnaA strain.

      (2) the interesting perturbations (titration, production arrest, and H-NS).

      (3) the use of single-cell mRNA FISH to monitor transcripts directly.

      The proposed extrusion model is also interesting, though not fully validated, and I think it will contribute positively to the future debate.

      Weaknesses and Limitations:

      (1) A relevant limitation in novelty is that DnaA activity and concentration oscillations have been reported by the cited Iuliani and coworkers previously by dynamic microscopy, and to a smaller extent by the other cited study by Pountain and coworkers using mRNA FISH.

      (2) An important limitation is that the study is not dynamic. While monitoring mRNA is interesting and relevant, the current study is based on concentrations and not time variations (or nascent mRNA). Conversely, the study by Iuliani and coworkers, while having the drawback of monitoring proteins, can directly assess production rates. It would be interesting for future studies or revisions to monitor the strains and reporters dynamically, as well as using (as a control) the technique of this study on the chromosomal reporters used by Iuliani et al.

      (3) Regarding the mathematical models, a lot of details are missing regarding the definitions and the use of such models, which are only presented briefly in the Methods section. The reader is not given any tools to understand the predictions of different models, and no analytical estimates are used. The falsification procedures are not clear. More transparency and depth in the analysis are needed, unless the models are just used as a heuristic tool for qualitative arguments (but this would weaken the claims). The Berger model, for example, has many parameters and many regimes and behaviors. When models are compared to data (e.g., in Figure 2G), it is not clear which parameters were used, how they were fixed, and whether and how the model prediction depends on parameters.

      (4) Importantly, the main statement about tight correlations of peak volumes and average estimated initiation volume does not establish coincidence, and some of the claims by the authors are unclear in these respects (e.g., when they say "we resolve a 1:1 coupling between DnaA activity thresholds and replication initiation", the statement could be correct but is ambiguous). Crucially, the data rely on average initiation volumes (on which there seems to be an eternally open debate, also involving the authors), and the estimate procedure relies on assumptions that could lead to biases and uncertainties added to the population variability (in any case, error bars are not provided).

      (5) The delays observed by the authors (in both directions) between the peaks of DnaA-activity conditional averages with respect to volume and the average estimated initiation volumes are not incompatible with those observed dynamically by Iuliani and coworkers. The direct experiment to prove the authors' point would be to use a direct proxy of replication initiation, such as SeqA or DnaN, and monitor initiations and quantify DnaA activity peaks jointly, with dynamic measurements.

      (6) While not being an expert, I had some doubt that the fact that the reporters are on plasmid (despite a normalization control that seems very sensible) might affect the measurements. Also, I did not understand how the authors validated the assumptions that the reporters are sensitive to DnaA-ATP specifically. It seems this assumption is validated by previous studies only.

      Overall Appraisal:

      In summary, this appears as a very interesting study, providing valuable data and a novel hypothesis, the extrusion model, open to future explorations. However, given several limitations, some of the claims appear overstated. Finally, the text contains some self-evaluations, such as "our findings redefine the paradigm for replication control", etc., that appear exaggerated.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that in E. coli, the initiator protein DnaA oscillates post-translationally: its activity rises and peaks exactly when DNA replication begins, even if dnaA transcription is held constant. To explain this, they propose an "extrusion" mechanism in which nucleoid-associated proteins such as H-NS, whose amount grows with cell volume, dislodge DnaA from chromosomal binding sites; modelling and H-NS perturbations reproduce the observed drop in initiation mass and extra initiations seen after dnaA shut-down. Together, the data and model link biomass growth to replication timing through chromosome-driven, post-translational control of DnaA, filling gaps left by classic titration and ATP/ADP-switch models.

      Strengths:

      (1) Introduces an "extrusion" model that adds a new post-translational layer to replication control and explains data unexplained by classic titration or ATP/ADP-switch frameworks.

      (2) A major asset of the study is that it bridges the longstanding gap between DnaA oscillations and DNA-replication initiation, providing direct single-cell evidence that pulses of DnaA activity peak exactly at the moment of initiation across multiple growth conditions and genetic perturbations.

      (3) A tunable dnaA strain and targeted H-NS manipulations shift initiation mass exactly as the model predicts, giving model-driven validation across growth conditions.

      (4) A purpose-built Psyn66 reporter combined with mRNA-FISH captures DnaA-activity pulses with cell-cycle resolution, providing direct, compelling data.

      Weaknesses:

      (1) What happens to the (C+D) period and initiation time as the dnaA mRNA level changes? This is not discussed in the text or figure and should be addressed.

      (2) It is unclear what is meant by "relative dnaA mRNA level." Relative to what? Wild-type expression? Maximum expression? This should be explicitly defined.

      (3) It would be helpful to provide some intuition for why an increase in dnaA mRNA level leads to a decrease in initiation mass per ori and an increase in oriC copy number.

      (4) The titration and switch models do not explicitly include dnaA mRNA in the dynamics of DnaA protein. Yet, in Figure 2G, initiation mass is shown to decrease linearly with dnaA mRNA level in these models. How was dnaA mRNA level represented or approximated in these simulations?

      (5) Is Schaechter's law (i.e., exponential scaling of average cell size with growth rate) still valid under the different dnaA mRNA expression conditions tested?

      (6) The manuscript should explain more explicitly how the extrusion model implements post-translational control of DnaA and, in particular, how this yields the nonlinear drop in relative initiation mass versus dnaA mRNA seen in Figure 6E. Please provide the governing equation that links total DnaA, the volume-dependent "extruder" pool, and the threshold of free DnaA at initiation, and show - briefly but quantitatively - how this equation produces the observed concave curve.

      (7) Does this Extrusion model give well well-known adder per origin, i.e., initiation to initiation is an adder.

      (8) DnaA protein or activity is never measured; mRNA is treated as a linear proxy. Yet the authors' own narrative stresses post-translational (not transcriptional) control of DnaA. Without parallel immunoblots or activity readouts, it is impossible to know whether a six-fold mRNA increase truly yields a proportional rise in active DnaA.

      (9) Figure 2 infers both initiation mass and oriC copy number from bulk measurements (OD₆₀₀ per cell and rifampicin-cephalexin run-out) instead of measuring them directly in single cells. Any DnaA-dependent changes in cell size, shape, or antibiotic permeability could skew these bulk proxies, so the plotted relationships may not accurately reflect true initiation events.

    1. eLife Assessment

      This paper reports the development of proteins and small molecules that drive bridge LMO2, an oncogenic transcription factor in T-ALL, to E3 ligases (Cereblon and VHL), and demonstrates their effectiveness in degrading FMO2, causing growth arrest and inducing apoptosis in T cell lines in vitro. The findings are valuable because they provide evidence that intrinsically disordered proteins can be targeted for degradation by PROTAC-type chemicals. The paper also provides a route for rational PROTAC design based on intracellular antibody paratopes. Overall, the paper is supported by solid evidence and will be of interest to chemical biologists and cancer pharmacologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe the degradation of an intrinsically disordered transcription factor (LMO2) via PROTACs (VHL and CRBN) in T-ALL cells. Given the challenges of drugging transcription factors, I find the work solid and a significant scientific contribution to the field.

      Strengths:

      (1) Validation of LMO2 degradation by starting with biodegraders, then progressing to chemical degrades.

      (2) interrogation of the biology and downstream pathways upon LMO2 degradation (collateral degradation and apoptotic markers).

      (3) Cell line models that are dependent/overexpression of LMO2 vs LMO2 null cell lines.

      (4) CRBN and VHL-derived PROTACs were synthesized and evaluated.

      Weaknesses:

      (1) The conventional method used to characterize PROTACs in the literature is to calculate the DC50 and Dmax of the degraders, I did not find this information in the manuscript.

      (2) The proteomics data is not very convincing, and it is not clear why LMO2 does not show in the volcano plot (were higher concentrations of the PROTAC tested? and why only VHL was tested and not CRBN-based PROTAC?).

      (3) The correlation between degradation potency and cell growth is not well-established (compare Figure 4C: P12-Ichikawa blots show great degradation at 24 and 48 hrs, but it is unclear if the cell growth in this cell line is any better than in PF-382 or MOLT-16) - Can the authors comment on the correlation between degradation and cell growth?

      (4) The PROTACs are not very potent (double-digit micromolar range?) - can the authors elaborate on any challenges in the optimization of the degradation potency?

      (5) The authors mentioned trying six iDAb-E3 ligase proteins; I would recommend listing the E3 ligases tried and commenting on the results in the main text.

    3. Reviewer #2 (Public review):

      Summary:

      Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines.

      Strengths:

      The topic of degrader development for intrinsically disordered proteins is of high interest, and the authors aimed to tackle a difficult drug target. The authors evaluated methods, including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity.

      Weaknesses:

      The overall degradation is relatively weak, and the mechanism of potential collateral degradation is not thoroughly evaluated. In addition, experiments comparing the authors' prior work with their anti-LMO2 iDAb or Abl-L are lacking, which would improve our understanding of the potential advantages of a degrader strategy for LMO2.

    1. eLife Assessment

      This well-designed, valuable study uses isotope tracing to analyse how iron limitation alters TCA cycle metabolism in Mycobacterium tuberculosis, revealing potential antibiotic targets for non-replicating bacteria in the host. The findings provide insights into metabolic remodelling under iron-limited conditions. Whilst some of the evidence is solid, the data around the GABA shunt is incomplete, requiring genetic validation, as was done for the glyoxylate shunt. Questions remain about the underlying mechanisms and their specific role in M. tuberculosis pathogenesis.

    2. Reviewer #1 (Public review):

      M. tuberculosis exhibits metabolic flexibility, enabling it to adapt to various environmental stresses, including antibiotic treatment. In this manuscript, Serafini et al. investigate the metabolic remodeling of M. tuberculosis used to survive iron-limited conditions by employing LC-MS metabolomics and 13C isotope tracing experiments. The results demonstrate that metabolic activity in the oxidative branch of the TCA cycle slows down, while the reductive branch is reverted to facilitate the biosynthesis of malate, which is subsequently secreted.

      Overall, this study is experimentally well-designed, particularly the use of 13C isotope tracing to monitor TCA cycle remodeling under iron-limited conditions. The findings are valuable as they offer potential new targets for antibiotics aimed at non-replicating M. tuberculosis occurring in the hosts. However, despite these strengths, the reviewer has concerns regarding the mechanistic basis underlying the observed metabolic remodeling and its role in M. tuberculosis pathogenesis.

      Major Comments:

      The authors argue that iron starvation is a physiologically relevant stressor encountered by M. tuberculosis post-infection. Using Erdman and H37Rv strains under DFO conditions, Erdman loses viability, whereas H37Rv maintains it. Nonetheless, both strains exhibit similar metabolic remodeling in the TCA cycle based upon metabolomics and isotope tracing data. The authors should clarify the specific metabolic adaptations in H37Rv that enable it to sustain viability under DFO conditions.

      The authors report no significant changes in NAD/NADH and ATP levels in H37Rv and Erdman exposed to DFO conditions. They observe TCA cycle remodeling, particularly the reversal of the reaction between OAA and MAL, catalyzed by malate dehydrogenase, an enzyme that uses NAD+ and NADH as cofactors. The directionality of this reaction likely depends on the relative levels of NAD+ and NADH. Additionally, other dehydrogenases, such as pyruvate DH and aKG DH, also require NAD+/NADH cofactors. In Figure 1I, NAD+ and NADH levels are monitored only at day 3 post-exposure to DFO conditions. Since Erdman loses viability after 2-3 weeks, the authors should include measurements of NAD+, NADH, and ATP levels at weekly intervals up to 3 weeks. Furthermore, glycine levels - which are linked to NAD+ recycling via the conversion of glyoxylate - should be measured under both HI and DFO conditions as an indirect indicator of the NAD+/NADH ratio.

      In Figure 2A, it is unclear why a 100-fold accumulation of aKG does not correspond proportionally to the accumulation of (iso)citrate.

      The authors state that fumarate, aKG, (iso)citrate, malate, and pyruvate are secreted under DFO conditions. While the secretion of aKG and pyruvate makes sense, given their marked intracellular accumulation, it is puzzling why (iso)citrate, malate, and fumarate are secreted even though there are no changes in their intracellular abundance. To rule out the possibility that these metabolites are released due to bacterial lysis rather than active secretion, the authors should analyze the 13C-labeled fractions of these metabolites in the culture filtrate using the M. tuberculosis culture in media containing 13C glycerol.

      To validate the role of the PCK-mediated reductive TCA cycle in malate biosynthesis and secretion under DFO conditions, the authors should generate a malate dehydrogenase (MDH) knockdown strain, considering that MDH is essential, and examine the 13C labeling patterns and NAD/NADH under DFO conditions.

      The authors also observe decreased GABA abundance and overall 13C labeling in DFO conditions, suggesting that the GABA shunt is the primary route for Succinate biosynthesis under DFO conditions. Thus, it is strongly recommended that the authors perform a 13C glutamate tracing experiment to directly track labeling in aKG and GABA shunt metabolites, providing more definitive evidence for the involvement of the GABA shunt.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated the effect of prolonged iron limitation (which does stop growth but does not lead to cell death), altering central metabolism in M. tuberculosis. The major tool they used is metabolomics combined with stable isotope tracing. They show that the Krebs cycle is still active, despite the fact that it is dependent on some iron-dependent enzymes. They show that carbon flux through the oxidative branch of the Krebs cycle is stalled, resulting in the accumulation of metabolites, such as malate and alpha-ketoglutarate, that are partially secreted. Apparently, the carbon flux from glycolysis is partially diverted to the reductive branch of the Krebs cycle. This is not achieved by using the glyoxylate shunt but probably through the GABA shunt. This unprecedented split of the Krebs cycle and malate secretion allows a continuous flow of carbon through the core of carbon metabolism, overcoming the metabolic stalling triggered by iron starvation.

      Strengths:

      Novel insight into the central metabolism of a major pathogen and its adaptation to iron starvation. Carefully conducted experimentation. The paper ends with a clear and helpful model.

      Weaknesses:

      The authors show some surprising and important findings, but they would need a little more effort to really substantiate these. Especially the role of the GABA shunt should be genetically tested, as they did for ICL and the glyoxylate shunt.

      Also, dataset 1 is not very convincing, it is only based on transcriptomics and shown with up or down; this is not a strong base for major conclusions. As a minimum, one would want actual differences, preferably on the protein level, where it really counts.

    1. eLife Assessment

      This important paper reports the discovery of calcarins, a protein family that seems to be involved in calcification in the calcareous sponge Sycon ciliatum, significantly enhancing our understanding of the molecular and cellular mechanisms underlying spicule formation in sponges and the evolution of carbonate biomineralization. The conclusions are supported by compelling evidence based on an integrated analysis that combines transcriptomics, genomics, proteomics, and precise in situ hybridization. These findings will be of broad interest to cell biologists, biochemists, and evolutionary biologists.

    2. Reviewer #1 (Public review):

      To elucidate the mechanisms and evolution of animal biomineralization, Voigt et al. focused on the sponge phylum-the earliest branching extant metazoan lineages exhibiting biomineralized structures-with a particular emphasis on deciphering the molecular underpinnings of spicule formation. This study centered on calcareous sponges, specifically Sycon ciliatum, as characterized in previous work by Voigt et al. In S. ciliatum, two morphologically distinct spicule types are produced by set of two different types of cells that secrete extracellular matrix proteins, onto which calcium carbonate is subsequently deposited. Comparative transcriptomic analysis between a region with active spicule formation and other body regions identified 829 candidate genes involved in this process. Among these, the authors focused on the calcarine gene family, which is analogous to the Galaxins, the matrix proteins known to participate in coral calcification. The authors performed three-dimensional structure prediction using AlphaFold, examined mRNA expression of Calcarin genes in spicule-forming cell types via in situ hybridization, conducted proteomic analysis of matrix proteins isolated from purified spicules, and carried out chromosome arrangement analysis of the Calcarin genes. Based on these analyses, it was revealed that the combination of Calcarin genes expressed during spicule formation differs between the founder cells-responsible for producing diactines and triactines-and the thickener cells that differentiate from them, underscoring the necessity for precise regulation of Calcarin gene expression in proper biomineralization. Furthermore, the observation that 4 Calcarin genes are arranged in tandem arrays on the chromosome suggests that two rounds of gene duplication followed by neofunctionalization have contributed to the intricate formation of S. ciliatum spicules. Additionally, similar subtle spatiotemporal expression patterns and tandem chromosomal arrangements of Galaxins during coral calcification indicate parallel evolution of biomineralization genes between S. ciliatum and aragonitic corals.

      Strength:

      The study presents detailed and convincing insights that point to parallel evolution of biomineralization in calcitic sponges and corals. This is supported by a comprehensive analysis employing a wide range of experimental approaches including protein tertiary structure predictions, gene expression profiling during calcification (RNA seq and Whole-mount in situ hybridization), and chromosomal sequence analysis.

      An integrative research approach, encompassing transcriptomic, genomic, and proteomic analyses as well as detailed FISH.

      High-quality FISH images of Calcarin genes, along with a concise summary clearly illustrating their expression patterns, is appreciated.

      It was suggested that thickener cells originate from founder cells. To the best of my knowledge, this is the first study to demonstrate trans-differentiation of sponge cells based on the cell-type specific gene expression, as determined by in situ hybridization.

      Overall, this is a high-quality piece of work that proposes a compelling scenario for biomineralization.

      Weaknesses:

      I found no significant weakness in this manuscript.

      Comments on revisions:

      The authors have addressed all of the questions and recommendations from the prior review.

    3. Reviewer #2 (Public review):

      Summary:

      This paper reports on the discovery of calcarins, a protein family that seems involved in calcification in the sponge Sycon ciliatum, based on specific expression in sclerocytes and detection by mass spectrometry within spicules. Two aspects stand out: (1) the unexpected similarity between Sycon calcarins and the galaxins of stony corals, which are also involved in mineralization, suggesting a surprising, parallel co-option of similar genes for mineralization in these two groups; (2) the impressively cell-type-specific expression of specific calcarins, many of which are restricted to either founder or thickener cells, and to either diactines, triactines, or tetractines. The finding that calcarins likely diversified at least partly by tandem duplications (giving rise to gene clusters) is a nice bonus.

      Strengths:

      I enjoyed the thoroughness of the paper, with multiple lines of evidence supporting the hypothesized role of calcarins: spatially and temporally resolved RNAseq, mass spectrometry, and whole-mount in situ hybridization using CISH and HCR-FISH (the images are really beautiful and very convincing). The structural predictions and the similarity to galaxins are very surprising and extremely interesting, as they suggest parallel evolution of biomineralization in sponges and cnidarians during the Cambrian explosion by co-option of the same "molecular bricks".

      Weaknesses:

      I did not detect any major weakness, beyond those inherent to working with sponges (lack of direct functional inhibition of these genes) or with fast-evolving gene families with complex evolutionary histories (lack of a phylogenetic tree that would clarify the history of galaxins/calcarins and related proteins).

      Comments on revisions:

      I am fully satisfied with the revision, and notably with the new Figure 3 which is now extremely informative and readable. Congratulations on a job well done.

    4. Reviewer #3 (Public review):

      Summary:

      Voigt et al. present a comprehensive study exploring the molecular mechanisms and evolution of biomineralization in the calcareous sponge Sycon ciliatum. Using a multi-omics approach, including comparative transcriptomics, proteomics, genomic analyses, and high-resolution in situ hybridization, the authors identify 829 candidate biomineralization genes, with a special focus on the calcarin gene family. These calarains, structurally analogous to galaxin in stony corals, show cell-type- and spicule-type-specific expression patterns, revealed through meticulous FISH imaging. Chromosomal analysis further uncovers that several calcarin genes are arranged in tandem arrays, suggesting diversification via gene duplication and neofunctionalization. Notably, the study finds striking parallels between the calcarins of S. ciliatum and galaxins of aragonitic corals in terms of gene arrangement, tertiary structure predictions, and expression dynamics, pointing to a remarkable case of parallel evolution during the emergence of biomineralized skeletons in early metazoans.

      Strengths:

      The study is methodologically robust, integrating transcriptomic, proteomic, and genomic data with detailed cell biological analysis.

      High-quality, carefully annotated FISH images convincingly demonstrate the spatial expression patterns of calcarins.

      Novel evidence of sponge cell trans-differentiation is presented through cell-type-specific gene expression.

      The comparative perspective with coral galaxins is well-executed and biologically insightful, supported by structural predictions and chromosomal data.

      Figures and supplementary materials are thoughtfully revised for clarity and accessibility, addressing reviewer feedback.

      Weaknesses:

      Direct functional validation of calcarin roles in biomineralization is lacking, a limitation acknowledged by the authors and inherent to sponge models.

      The evolutionary history of calcarins and galaxins remains only partially resolved due to challenges in reconstructing phylogenies of fast-evolving gene families.

      Some initial figure annotations and definitions (e.g., "radial tube") required clarification, although these were addressed in revision.

      Overall, the work significantly advances our understanding of biomineralization´s molecular basis and its parallel evolution in early diverging metazoans.

      Comments on revisions:

      I would like to thank the authors for addressing all my comments/suggestions. I am OK with the revised version of the manuscript

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      To elucidate the mechanisms and evolution of animal biomineralization, Voigt et al. focused on the sponge phylum - the earliest branching extant metazoan lineages exhibiting biomineralized structures - with a particular emphasis on deciphering the molecular underpinnings of spicule formation. This study centered on calcareous sponges, specifically Sycon ciliatum, as characterized in previous work by Voigt et al. In S. ciliatum, two morphologically distinct spicule types are produced by a set of two different types of cells that secrete extracellular matrix proteins, onto which calcium carbonate is subsequently deposited. Comparative transcriptomic analysis between a region with active spicule formation and other body regions identified 829 candidate genes involved in this process. Among these, the authors focused on the calcarine gene family, which is analogous to the Galaxins, the matrix proteins known to participate in coral calcification. The authors performed three-dimensional structure prediction using AlphaFold, examined mRNA expression of Calcarin genes in spiculeforming cell types via in situ hybridization, conducted proteomic analysis of matrix proteins isolated from purified spicules, and carried out chromosome arrangement analysis of the Calcarin genes.

      Based on these analyses, it was revealed that the combination of Calcarin genes expressed during spicule formation differs between the founder cells-responsible for producing diactines and triactinesand the thickener cells that differentiate from them, underscoring the necessity for precise regulation of Calcarin gene expression in proper biomineralization. Furthermore, the observation that 4 Calcarin genes are arranged in tandem arrays on the chromosome suggests that two rounds of gene duplication followed by neofunctionalization have contributed to the intricate formation of S. ciliatum spicules. Additionally, similar subtle spatiotemporal expression patterns and tandem chromosomal arrangements of Galaxins during coral calcification indicate parallel evolution of biomineralization genes between S. ciliatum and aragonitic corals. 

      Strengths: 

      (1) An integrative research approach, encompassing transcriptomic, genomic, and proteomic analyses as well as detailed FISH. 

      (2) High-quality FISH images of Calcarin genes, along with a concise summary clearly illustrating their expression patterns, is appreciated. 

      (3) It was suggested that thickener cells originate from founder cells. To the best of my knowledge, this is the first study to demonstrate trans-differentiation of sponge cells based on the cell-typespecific gene expression, as determined by in situ hybridization. 

      (4) The comparison between Calcarins of Calcite sponge and Galaxins of aragonitic corals from various perspective-including protein tertiary structure predictions, gene expression profiling during calcification, and chromosomal sequence analysis to reveal significant similarities between them. 

      We thank the reviewer for this assessment. 

      (1) The conclusions of this paper are generally well supported by the data; however, some FISH images require clearer indication or explanation.

      We have modified Fig. 3 by including some insets indicating the depicted part of the sponge body and to change the color-scheme as suggested by reviewer3 for the FISH images. In accordance to the following comment, we decided to remove single-channel views in Fig. 3 A. 

      (2) Figure S2 (B, C, D): The fluorescent signals in these images are difficult to discern. If the authors choose to present signals at such low magnification, enhancing the fluorescence signals would improve clarity. Additionally, incorporating Figure S2A as an inset within Figure S2E may be sufficient to convey the necessary information about signal localization. 

      We changed the figure according to the suggestions.

      (3) Figure S3A: The claim that Cal2-expressing spherical cells are closely associated with the choanoderm at the distal end of the radial tube is difficult to follow. Are these Cal2-expressing spherical cells interspersed among choanoderm cells, or are they positioned along the basal surface of the choanoderm? Clarifying their precise localization and indicating it in the image would strengthen the interpretation. 

      In the figure, the view is on the choanoderm that lines the inner surface of the radial tube. Our interpretation is that the spherical cells are positioned at the basal surface of the choanoderm. We updated Fig. S3, which now includes another view to support our interpretation and also indicate some choanocytes.

      (4) To further highlight the similarities between S.ciliatum and aragonitic corals in the molecular mechanisms of calcification, consider including a supplementary figure providing a concise depiction of the coral calcification process. This would offer valuable context for readers.

      We considered this suggestion, and have included such a supplementary figure (Fig. S9).

      Reviewer #2 (Public review): 

      Summary: 

      This paper reports on the discovery of calcarins, a protein family that seems involved in calcification in the sponge Sycon ciliatum, based on specific expression in sclerocytes and detection by mass spectrometry within spicules. Two aspects stand out: (1) the unexpected similarity between Sycon calcarins and the galaxins of stony corals, which are also involved in mineralization, suggesting a surprising, parallel co-option of similar genes for mineralization in these two groups; (2) the impressively cell-type-specific expression of specific calcarins, many of which are restricted to either founder or thickener cells, and to either diactines, triactines, or tetractines. The finding that calcarins likely diversified at least partly by tandem duplications (giving rise to gene clusters) is a nice bonus. 

      Strengths: 

      I enjoyed the thoroughness of the paper, with multiple lines of evidence supporting the hypothesized role of calcarins: spatially and temporally resolved RNAseq, mass spectrometry, and whole-mount in situ hybridization using CISH and HCR-FISH (the images are really beautiful and very convincing). The structural predictions and the similarity to galaxins are very surprising and extremely interesting, as they suggest parallel evolution of biomineralization in sponges and cnidarians during the Cambrian explosion by co-option of the same "molecular bricks". 

      Weaknesses: 

      I did not detect any major weakness, beyond those inherent to working with sponges (lack of direct functional inhibition of these genes) or with fast-evolving gene families with complex evolutionary histories (lack of a phylogenetic tree that would clarify the history of galaxins/calcarins and related proteins). 

      We thank the reviewer for this assessment and the detailed comments be addressed below.

      Reviewer #3 (Public review):

      Summary: 

      The study explores the extent to which the biomineralization process in the calcitic sponge Sycon ciliatum resembles aragonitic skeleton formation in stony corals. To investigate this, the authors performed transcriptomic, genomic, and proteomic analyses on S. ciliatum and examined the expression patterns of biomineralization-related genes using in situ hybridization. Among the 829 differentially expressed genes identified in sponge regions associated with spicule formation, the authors focused on calcarin genes, which encode matrix proteins analogous to coral galaxins. The expression patterns of calcarins were found to be diverse but specific to particular spicule types. Notably, these patterns resemble those of galaxins in stony corals. Moreover, the genomic organization of calcarine genes in S. ciliatum closely mirrors that of galaxin genes in corals, suggesting a case of parallel evolution in carbonate biomineralization between calcitic sponges and aragonitic corals. 

      Strengths: 

      The manuscript is well written, and the figures are of high quality. The study design and methodologies are clearly described and well-suited to addressing the central research question. Particularly noteworthy is the authors´ integration of various omics approaches with molecular and cell biology techniques. Their results support the intriguing conclusion that there is a case of parallel evolution in skeleton-building gene sets between calcitic sponges and aragonitic corals. The conclusions are well supported by the data and analyses presented. 

      Weaknesses: 

      The manuscript is strong, and I have not identified any significant weaknesses in its current form. 

      We thank the reviewer for the insight and addressed the detailed comments below.

      Reviewer #1 (Recommendations for the authors): 

      The description of the region "radial tube" is unclear. Please define and explain it at its first mention in the manuscript, and, if possible, refer to the appropriate figure(s) (e.g., Figure 1A). 

      We now explain radial tubes at the beginning of the results and added a label in figure 1A. “Sycon ciliatum is a tube-shaped sponge with a single apical osculum and a sponge wall of radial tubes around the central atrium (Fig. 1A). The radial tubes are internally lined with choanoderm, which forms elongated chambers in an angle of approximately 90° to the tube axis”. 

      Reviewer #2 (Recommendations for the authors): 

      Scientific suggestions: 

      (1) Page 13: "Despite their presence in the same orthogroups, the octocoral and stony coral proteins were only distantly related to the calcareous sponge calcarins (e.g., 12-24% identity between octocoral and calcareous sequences in orthogroup Cal 2-4-6), resulting in poor alignment. Their homology to calcarins, therefore, remains to be determined." Could 3D structures of these coral proteins be predicted with AlphaFold to substantiate (or nuance) the comparison with calcarins? 

      We run additional alphafold predictions for two octocoral and two scleractinian galaxins. A galaxin-like sequence from Pinnigorgia flava was only a short fragment and therefore we did not attempt any structure predictions. The result shows that the octocoral galaxin-like proteins show some structural similarity (12 beta-harpins), while the scleractinian galaxin-like proteins differ from the sponge counterparts of the same orthogroup. We added this information to the results and in the new Fig. S7.

      Minor improvements to the text: 

      (1)  Page 7 : "The expression of Cal1 to Cal8 was investigated using chromogenic in situ hybridization (CISH) and hairpin-chain reaction fluorescence in situ hybridization (HCR-FISH), confirming their presence in sclerocytes." - Figure 3 should be cited here. 

      We refer to the figure now.

      (2) Page 8-9: "Cal6 expression mirrors that of Cal2, occurring in rounded cells at the distal tip of radial tubes and in a ring of cells around the oscular ring." - Please cite a figure here. 

      We refer now to Fig. 3K

      (3) Page 11-12: Please define eigengene, this term is not necessarily common knowledge. 

      We provide now a short definition in this sentence: “ The analysis provided eight meta-modules, of which four showed significant changes in expression module eigengenes —summary profiles that capture the overall expression pattern of each module— between samples with high spicule formation context (osculum region and regeneration stages older than four days) and samples with low spicule formation (sponge-wall and early regeneration stages until day 3-4) (Fig. S5).” 

      (4) Page 13: "Species without skeletons, such as the cnidarians Hydra, Actinia, Exaiptasia, and Nematostella, also possess galaxin-like proteins." This is too concise - can you explain what evidence was used? PANTHER, AlphaFold, OrthoFinder, Blastp...? 

      The evidence used is from PANTHER, and we enhanced clarification of this by modifying the last sentence of the section.

      (5) Page 20: "We have identified calcarins, galaxin-like proteins, as crucial components of the biomineralization toolkit in calcareous sponges." I'm not sure you showed they are crucial (this would require functional evidence). Perhaps "novel" components or some other adjective would fit better. 

      We changed the adjective to “novel”.

      Suggestions for the figures: 

      (1) Figure 1A: radial tubes should be labelled. 

      A label was added.

      (2) Figure 3 is beautiful but hard to parse. The name of all markers should be written on each panel (notably B, C, and D) and ideally placed in a consistent position (top right corner?) so that the reader's eye doesn't have to look for them anew in each panel. Consider depicting the same gene with the same color in all panels if possible (confocal imaging gives virtual colors anyway, there's no reason to be bound to the real-life color of the fluorophores used - if that was the original intent). Finally, the red/green color scheme is not colorblind-readable, so please consider switching to another scheme (white/cyan/magenta, for example).

      We have updated the figure according to the suggestions. The names of all markers are now included on each panel. Placing them in the upper right corner was not feasible for all panels, so we adjusted their placement as needed. Reoccurring genes are shown in the same color where possible. To improve accessibility for individuals with red/green color vision deficiency, we adopted a cyan/magenta/yellow color scheme. Each HCR-FISH image was processed in ImageJ by splitting the image into channels, applying cyan, magenta, or yellow lookup tables, converting each channel to RGB, and then stacking and blending them using the Z-Project function with maximum intensity projection. Since the original channel information is not preserved after this processing, we provide the original red/green/blue version of the figure in the supplementary material in Fig S11. Additionally, we added small sketches of Figure 1A to indicate the sponge body regions depicted, where relevant.

      (3) Figure S3: the blue staining is not explained. It is also unclear where choanocytes are - could individual choanocytes be indicated with arrows or lines? 

      We added the information to the figure legend. The blue channel shows “Autofluorescence detected with the Leica TXR filter (approx. 590–650 nm), included to help distinguish true signal from background autofluorescence observed in the FITC channel (used for Spiculin detection).”

      Reviewer #3 (Recommendations for the authors): 

      I have no major concerns about the manuscript - only minor edits and comments, which are listed below: 

      (1) On page 13, the authors refer to Figure S8; however, I believe this should be Figure S7. 

      We now refer to the correct Figure. Because of introducing a new Fig. S7, now the correct reference is Fig. S8.

      (2) On page 16, please correct "Spciulin" to "Spiculin". 

      Now corrected.

      (3) On page 17, there are two commas following "(Sycon)"; please remove one. 

      Corrected.

      (4) In the Data Accessibility section, none of the provided links appear to work. Please ensure all links are functional. 

      We apologize for this oversight and now provide working links. 

      (5) In Figure 3, the description of panel L is missing from the figure legend. 

      We added the description of this panel.

      (6) On page 39, change "Fig. 4" to "Figure 4" to maintain consistency throughout the manuscript. 

      Changed.

      (7) Figure S7 is not cited in the main text. Please, address this. 

      Corrected (see above at point 1)

      (8) In the legend for Table S2, the reference to Soubigou et al. (3) is incorrect, as it is not listed in the SI reference section. Please correct this. 

      Soubigou et al. (2020) is now included in the SI reference list.

    1. eLife Assessment

      This revised study provides fundamental insights into the differences in migratory primordial germ cells based on their anterior or posterior location. Through convincing methodology and analysis of single-cell RNA sequencing of an exceptionally large number of migratory primordial germ cells and surrounding somatic cells, the novel findings and datasets generated from this study provide many hypotheses of interest to germ cell biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Migration of the primordial germ cells (PGCs) in mice is asynchronous, such that leading and lagging populations of migrating PGCs emerge. Prior studies found that interactions between the cells the PGCs encounter along their migration routes regulates their proliferation. In this study, the authors used single cell RNAseq to investigate PGC heterogeneity and to characterize their niches during their migration along the AP axis. Unlike prior scRNAseq studies of mammalian PGCs, the authors conducted a time course covering 3 distinct stages of PGC migration (pre, mid, and post migration) and isolated PGCs from defined somite positions along the AP axis. In doing so, this allowed the authors to uncover differences in gene expression between leading and lagging PGCs and their niches and to investigate how their transcript profiles change over time. Among the pathways with the biggest differences were regulators of actin polymerization and epigenetic programming factors and Nodal response genes. In addition, the authors report changes in somatic niches, specifically greater non-canonical WNT in posterior PGCs compared to anterior PGCs. This relationship between the hindgut epithelium and migrating PGCs was also detected in reanalysis of a previously published dataset of human PGCs. Using whole mount immunofluorescence, the authors confirmed elevated Nodal signaling based on detection of the LEFTY antagonists and targets of Nodal during late stage PGC migration. Taken together, the authors have assembled a temporal and spatial atlas of mouse PGCs and their niches. This resource and the data herein provide support for the model that interactions of migrating mouse PGCs with their niches influences their proliferation, cytoskeletal regulation, epigenetic state and pluripotent state.

      Overall, the findings provide new insights into heterogeneity among leading and lagging PGC populations and their niches along the AP axis, as well as comparisons between mouse and human migrating PGCs. The data are clearly presented, and the text is clear and well-written. This atlas resource will be valuable to reproductive and developmental biologists as a tool for generating hypotheses and for comparisons of PGCs across species.

      Strengths:

      (1) High quality atlas of individual PGCs prior to, during and post migration and their niches at defined positions along the AP axis.

      (2) Comparisons to available datasets, including human embryos, provide insight into potentially conserved relationships among PGCs and the identified pathways and gene expression changes.

      (3) Detailed picture of PGC heterogeneity.

      (4) Valuable resource for the field.

      (5) Some validation of Nodal results and further support for models in the literature based on less comprehensive expression analysis.

    3. Reviewer #2 (Public review):

      Summary:

      Germ cells go on to form sperm and eggs and are, therefore, critical for the survival of the species. This work addresses the question of how 'leading' and 'lagging' PGCs differ, molecularly, during their migration to the mouse genital ridges/gonads during fetal life (E9.5, E10.5, E11.5), and how this is regulated by different somatic environments encountered during the process of migration. E9.5 and E10.5 cells differed in expression of genes involved in canonical WNT signaling and focal adhesions. Differences in cell adhesion, actin cytoskeletal dynamics were identified between leading and lagging cells, at E9.5, before migration into the gonads. At E10.5, when some PGCs have reached the genital ridges, differences in Nodal signaling response genes and reprogramming factors were identified. This last point was verified by whole mount IF for proteins downstream of Nodal signaling, Lefty1/2. At E11.5, there was upregulation of genes associated with chromatin remodeling and oxidative phosphorylation. Some aspects of the findings were also found to be likely true in human development, established via analysis of a dataset previously published by others.

      Strengths:

      The work is strong in that a large number of PGCs were isolated and sequenced, along with associated somatic cells. The authors dealt with the problem of a very small number of migrating mouse PGCs by pooling cells from embryos (after ascertaining age matching using somite counting). 'Leading' and 'lagging' populations were separated by anterior and posterior embryo halves and the well-established Oct4-deltaPE-eGFP reporter mouse line was used.

      The most likely possible use of this fundamental information will be the incorporation of some aspects (e.g. the potential importance of Nodal signaling) into protocols for generation of in vitro derived gametes.

    4. Reviewer #3 (Public review):

      Summary:

      The migration of primordial germ cells (PGCs) to the developing gonad is a poorly understood yet essential step in reproductive development. Here, the authors examine whether there are differences in leading and lagging migratory PGCs using single-cell RNA sequencing of mouse embryos. Cleverly, the authors dissected embryonic trunks along the anterior-to-posterior axis prior to scRNAseq in order to distinguish leading and lagging migratory PGCs. After batch corrections, their analyses revealed several known and novel differences in gene expression within and around leading and lagging PGCs, intercellular signaling networks, as well as number of genes upregulated upon gonad colonization. The authors then compared their datasets with publicly available human datasets to identify common biological themes. Altogether, this rigorous study reveals several differences between leading and lagging migratory PGCs, hints at signatures for different fates among the population of migratory PGCs, and provides new potential markers for post-migratory PGCs in both humans and mice. While many of the interesting hypotheses that arise from this work are not extensively tested, these data provide a rich platform for future investigations.

      Strengths:

      The authors have successfully navigated significant technical challenges to obtain a substantial number of mouse migratory primordial germ cells for robust transcriptomic analysis. Here, the authors were able to collect quality data on ~13,000 PGCs and ~7,800 surrounding somatic cells, which is ten times more PGCs than previous studies.

      The decision to physically separate leading and lagging primordial germ cells was clever and well-validated based on expected anterior-to-posterior transcriptional signatures.

      Within the PGCs and surrounding tissues, the authors found many gene expression dynamics they would expect to see both along the PGC migratory path as well as across developmental time, increasing confidence in the new differentially expressed genes they found.

      The comparison of their mouse-based migratory PGC datasets with existing human migratory PGC datasets is appreciated.

      The quality control, ambient RNA contamination elimination, batch correction, cell identification and analysis of scRNAseq data were thorough and well-done such that the new hypotheses and markers found through this study are dependable.

      The subsetting of cells in their trajectory analysis is appreciated, further strengthening their cell terminal state predictions.

      Weaknesses:

      There were a few validation experiments within this study. For one such experiment, whether there is a difference in pSMAD2/3 along the AP axis is unclear and not quantified, as was nicely done for Lefty1/2.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Migration of the primordial germ cells (PGCs) in mice is asynchronous, such that leading and lagging populations of migrating PGCs emerge. Prior studies found that interactions between the cells the PGCs encounter along their migration routes regulates their proliferation. In this study, the authors used single cell RNAseq to investigate PGC heterogeneity and to characterize their niches during their migration along the AP axis. Unlike prior scRNAseq studies of mammalian PGCs, the authors conducted a time course covering 3 distinct stages of PGC migration (pre, mid, and post migration) and isolated PGCs from defined somite positions along the AP axis. In doing so, this allowed the authors to uncover differences in gene expression between leading and lagging PGCs and their niches and to investigate how their transcript profiles change over time. Among the pathways with the biggest differences were regulators of actin polymerization and epigenetic programming factors and Nodal response genes. In addition, the authors report changes in somatic niches, specifically greater non-canonical WNT in posterior PGCs compared to anterior PGCs. This relationship between the hindgut epithelium and migrating PGCs was also detected in reanalysis of a previously published dataset of human PGCs. Using whole mount immunofluorescence, the authors confirmed elevated Nodal signaling based on detection of the LEFTY antagonists and targets of Nodal during late stage PGC migration. Taken together, the authors have assembled a temporal and spatial atlas of mouse PGCs and their niches. This resource and the data herein provide support for the model that interactions of migrating mouse PGCs with their niches influences their proliferation, cytoskeletal regulation, epigenetic state and pluripotent state.

      Overall, the findings provide new insights into heterogeneity among leading and lagging PGC populations and their niches along the AP axis, as well as comparisons between mouse and human migrating PGCs. The data are clearly presented, and the text is clear and well-written. This atlas resource will be valuable to reproductive and developmental biologists as a tool for generating hypotheses and for comparisons of PGCs across species.

      Strengths:

      (1) High quality atlas of individual PGCs prior to, during and post migration and their niches at defined positions along the AP axis.

      (2) Comparisons to available datasets, including human embryos, provide insight into potentially conserved relationships among PGCs and the identified pathways and gene expression changes.

      (3) Detailed picture of PGC heterogeneity.

      (4) Valuable resource for the field.

      (5) Some validation of Nodal results and further support for models in the literature based on less comprehensive expression analysis.

      Weaknesses:

      (1) No indication of which sex(es) were used for the mouse data and whether or not sex-related differences exist or can excluded at the stages examined. This should be clarified.

      We have added: “Embryos of both sexes were pooled without genotyping, as the timepoints analyzed were prior to sex specification” to both the Animals section of the Materials and Methods and the Figure 1 legend. In addition, bioinformatic evaluation of potential sex biases in Nodal-Lefty signaling using Y-chromosome gene expression is reported in supplementary figure 4 and discussed in Discussion paragraph 2.

      Reviewer #2 (Public review):

      Summary:

      This work addresses the question of how 'leading' and 'lagging' PGCs differ, molecularly, during their migration to the mouse genital ridges/gonads during fetal life (E9.5, E10.5, E11.5), and how this is regulated by different somatic environments encountered during the process of migration. E9.5 and E10.5 cells differed in expression of genes involved in canonical WNT signaling and focal adhesions. Differences in cell adhesion, actin cytoskeletal dynamics were identified between leading and lagging cells, at E9.5, before migration into the gonads. At E10.5, when some PGCs have reached the genital ridges, differences in Nodal signaling response genes and reprogramming factors were identified. This last point was verified by whole mount IF for proteins downstream of Nodal signaling, Lefty1/2. At E11.5, there was upregulation of genes associated with chromatin remodeling and oxidative phosphorylation. Some aspects of the findings were also found to be likely true in human development, established via analysis of a dataset previously published by others.

      Strengths:

      The work is strong in that a large number of PGCs were isolated and sequenced, along with associated somatic cells. The authors dealt with problem of very small number of migrating mouse PGCs by pooling cells from embryos (after ascertaining age matching using somite counting). 'Leading' and 'lagging' populations were separated by anterior and posterior embryo halves and the well-established Oct4-deltaPE-eGFP reporter mouse line was used.

      Weaknesses:

      The work seems to have been carefully done, but I do not feel the manuscript is very accessible, and I do not consider it well written. The novel findings are not easy to find. The addition of at least one figure to show the locations of putative signaling etc. would be welcome.

      Thank you for the excellent suggestion. Fig. 6 has been added to highlight the main novel findings of this work and integrate them among contributions of earlier studies to provide a more complete view of signaling pathways and cell behaviors governing PGC migration.

      (1) The initial discussion of CellRank analysis (under 'Transcriptomic shifts over developmental time...' heading) is somewhat confusing - e.g. If CellRank's 'pseudotime analysis' produces a result that seems surprising (some E9.5 cells remain in a terminal state with other E9.5 cells) and 'realtime analysis' produces something that makes more sense, is there any point including the pseudotime analysis (since you have cells from known timepoints)? Perhaps the 'batch effects' possible explanation (in Discussion) should be introduced here. Do we learn anything novel from this CellRank analysis? The 'genetic drivers' identified seem to be genes already known to be key to cell transitions during this period of development.

      Thank you for this important observation. We have clarified the text in this section and added “This discrepancy may reflect differences in differentiation potential of some E9.5 PGCs that end in a terminal state among anterior E9.5 PGCs, but could also result from technical batch effects generated during library preparation. These possible interpretations are further discussed in the Discussion section.” to the pertinent results section and added additional relevant thoughts on the implications of this finding in Discussion paragraphs 4 and 7. We feel that it is important to include both results to the reader, as it is challenging to differentiate between heterogeneous developmental and migratory potential among E9.5 anterior PGCs and differential influence of batch effects across sequencing libraries with the data available.

      (2) In Discussion - with respect to Y-chromosome correlation, it is not clear why this analysis would be done at E10.5, when E11.5 data is available (because some testis-specific effect might be more apparent at the later stage).

      Since we had identified autocrine Nodal signaling primarily in anterior late migratory PGCs at E10.5 and knew that Nodal signaling was involved in sex specification of testicular germ cells into prospermatogonia by E12.5, we wanted to determine whether the Nodal signaling in late migratory PGCs at E10.5 was likely to be a sex-specific effect or was common to PGCs in both sexes. This was assessed in supplementary figure 4 and determined unlikely to be related to sex specification of PGCs as Nodal signaling was not strongly correlated with Y-chromosome transcripts in migratory PGCs. Assessing the relationship between Nodal signaling and Y-chromsome transcription at E11.5, when migration is complete, would be unlikely to help us further understand the dynamics of Nodal signaling during late PGC migration.

      (3) Figure 2A - it seems surprising that there are two clusters of E9.5 anterior cells

      Thank you for the interesting observation! One possibility is that the two states represent differential developmental competence as is suggested by the presence of one E9.5 anterior cluster along the differentiation trajectory in Fig 2A and one not within this differentiation trajectory. Another is that technical aspects of generating these sequencing libraries affected some cells more than others, resulting in clustering of highly affected and less affected cells, which would also be consistent with some E9.5 anterior cells lying within the differentiation trajectory and some not. Since it is challenging to differentiate between these possibilities with the data available, we have intentionally avoided overstating interpretations of this result in the manuscript text. We have included discussion of the potential implications of the transcriptional divergence you identify in Discussion paragraphs 4 and 7.

      (4) Figure 5F - there does seem to be more LEFTY1/2 staining in the anterior region, but also more germ cells as highlighted by GFP

      This is true; based on our selected anatomic landmarks for “anterior” and “posterior” as indicated in Methods, the “anterior” compartment typically contains more PGCs. Thus, we have included violin plots with all data points shown of signal intensities of both LEFTY1/2 and pSMAD2/3 in Fig. 5G and 5I so that the reader can evaluate the entire distribution of PGC signal intensities for each embryo.

      Reviewer #3 (Public review):

      Summary:

      The migration of primordial germ cells (PGCs) to the developing gonad is a poorly understood, yet essential step in reproductive development. Here, the authors examine whether there are differences in leading and lagging migratory PGCs using single-cell RNA sequencing of mouse embryos. Cleverly, the authors dissected embryonic trunks along the anterior-to-posterior axis prior to scRNAseq in order to distinguish leading and lagging migratory PGCs. After batch corrections, their analyses revealed several known and novel differences in gene expression within and around leading and lagging PGCs, intercellular signaling networks, as well as number of genes upregulated upon gonad colonization. The authors then compared their datasets with publicly available human datasets to identify common biological themes. Altogether, this rigorous study reveals several differences between leading and lagging migratory PGCs, hints at signatures for different fates among the population of migratory PGCs, and provides new potential markers for post-migratory PGCs in both humans and mice. While many of the interesting hypotheses that arise from this work are not extensively tested, these data provide a rich platform for future investigations.

      Strengths:

      The authors have successfully navigated significant technical challenges to obtain a substantial number of mouse migratory primordial germ cells for robust transcriptomic analysis. Here the authors were able to collect quality data on ~13,000 PGCs and ~7,800 surrounding somatic cells, which is ten times more PGCs than previous studies.

      The decision to physically separate leading and lagging primordial germ cells was clever and well-validated based on expected anterior-to-posterior transcriptional signatures.

      Within the PGCs and surrounding tissues, the authors found many gene expression dynamics they would expect to see both along the PGC migratory path as well as across developmental time, increasing confidence in the new differentially expressed genes they found.

      The comparison of their mouse-based migratory PGC datasets with existing human migratory PGC datasets is appreciated.

      The quality control, ambient RNA contamination elimination, batch correction, cell identification and analysis of scRNAseq data were thorough and well-done such that the new hypotheses and markers found through this study are dependable.

      The subsetting of cells in their trajectory analysis is appreciated, further strengthening their cell terminal state predictions.

      Weaknesses:

      Although it is useful to compare their mouse-based dataset with human datasets, the authors used two different analysis pipelines for each dataset. While this may have been due to the small number of cells in the human dataset as mentioned, it does make it difficult to compare them.

      Direct comparisons between findings in human and mouse focused on CellChat cell-cell communication prediction results, which were conducted in an identical fashion using the same analysis methods for both datasets.

      There were few validation experiments within this study. For one such experiment, whether there is a difference in pSMAD2/3 along the AP axis is unclear and not quantified as was nicely done for Lefty1/2.

      Additional validation of the pSMAD2/3 signal intensity along the AP axis was performed and is now included in Fig. 5.

    1. eLife Assessment

      This valuable study highlights how the diversity of the malaria parasite population diminishes following the initiation of effective control interventions but quickly rebounds as control wanes. It also demonstrates that the asymptomatic reservoir is unevenly distributed across host age groups. The data presented are convincing and the work shows how genetic studies could be used to monitor changes in disease transmission.

    2. Reviewer #2 (Public review):

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite number across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebound more slowly than prevalence measures. This adds to a growing literature that demonstrates the relevance of asymptomatic reservoirs.

      Overall, I found these results clear, convincing, and well presented. There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods, particularly in regions with high diversity/transmission. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric.

      As the authors address, their use of the term "census population size" is distinct from how the term is used in the population genetics literature. I therefore anticipate that parasite count will be most useful in an epidemiological context where the total number of sampled parasites can be contrasted with other metrics to help us better understand how parasites are divided across hosts, space, and time.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review): 

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebound more slowly than prevalence measures. This adds to a growing literature that demonstrates the relevance of asymptomatic reservoirs. 

      Strengths:  

      Overall, I found these results clear, convincing, and well-presented. There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods, particularly in regions with high diversity/transmission. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric.

      We thank the reviewer for this positive review of our results and approach.

      Weaknesses:

      While I understand the conceptual importance of distinguishing among parasite prevalence, mean MOI, and absolute parasite number, I am not fully convinced by this manuscript's implementation of "census population size".

      This reviewer remains unconvinced of the use of the term “census population size”. This appears to be due to the dependence of the term on sample size rather than representing a count of a whole population. To give context to our use we are clear in the study presented that the term describes a count of the parasite “strains” in an age-specific sample of a human population in a specified location undergoing malaria interventions. 

      They have suggested instead using “sample parasite count”.  We argue that this definition is too specific and less applicable when we extrapolate the same concept to a different denominator, such as the population in a given area. Importantly, our ecological use of a census allows us to count the appearance of the same strain more than once should this occur in different people. 

      The authors reference the population genetic literature, but within the context of that field, "census population size" refers to the total population size (which, if not formally counted, can be extrapolated) as opposed to "effective population" size, which accounts for a multitude of demographic factors. There is often interesting biology to be gleaned from the magnitude of difference between N and Ne.

      As stated in the introduction we have been explicit in saying that we are not using a population genetic framework. Exploration of N and Ne in population genetics has merit. How this is reconciled when using a “strain” definition and not neutral markers would need to be assessed.  

      In this manuscript, however, "census population size" is used to describe the number of distinct parasites detected within a sample, not a population. As a result, the counts do not have an immediate population genetic interpretation and cannot be directly compared to Ne. This doesn't negate their usefulness but does complicate the use of a standard population genetic term.

      We are clear we are defining a census of parasite strains in an age-specific sample of a population living in two catchment areas of Bongo District. We appreciate the concern of the reviewer and have now further edited the relevant paragraphs in both the Introduction (Lines 75-80) and the Discussion (Lines 501-506) to make very clear the dependence of the reported quantity on sample size, but also its feasible extrapolation consistent with the census of a population. 

      In contrast, I think that sample parasite count will be most useful in an epidemiological context, where the total number of sampled parasites can be contrasted with other metrics to help us better understand how parasites are divided across hosts, space and time. However, for this use, I find it problematic that the metric does not appear to correct for variations in participant number. For instance, in this study, participant numbers especially varied across time for 1-5 year-olds (N=356, 216, 405, and 354 in 2012, 2014, 2015, and 2017 respectively).

      The reviewer has made an important point that for the purpose of comparisons across the four surveys or study time points (i.e., 2012, 2014, 2015, and 2017), we should "normalize" the number of individuals considered for the calculation of the "census population size".  Given that this quantity is a sum of the estimated MOI<sub>var,,</sub> we need to have constant numbers for its values to be compared across the surveys, within age group and the whole population. This is needed not only to get around the issue of the drop in 1-5 year olds surveyed in 2014 but to also stabilize the total number of individuals for the whole sample and for specific age groups. One way to do this is to use the smaller sample size for each age group across time, and to use that value to resample repeatedly for that number of individuals for surveys where we have a larger sample size. This has now been updated included in the manuscript as described in the Materials and Methods (Lines 329-341) and in the Results (Lines 415-430; see updated Figure 4 and Table supplement 7). This correction produces very similar results to those we had presented before (see updated Figure 4 and Table supplement 7).   

      As stated in our previous response we have used participant number in an interrupted time series where the population was sampled by age to look at age-specific effects of sequential interventions IRS and SMC. As shown in Table supplement 1 of the 16 age-specific samples of the total population, we have sampled very similar proportions of the population by age group across the four surveys. The only exception was the 1-5 year-old age group during the survey in 2014. We are happy to provide additional details to further clarify the lower number (or percentage) of 1-5 year olds (based on the total number of participants per survey) in 2014 (~12%; N = 216) compared to the other surveys conducted 2012, 2015, and 2017 (~18-20%; N = 356, 405, and 354, respectively). Please see Table supplement 1 for the total number of participants surveyed in each of the four surveys (i.e., 2012, 2014, 2015, and 2017).   

      This sample size variability is accounted for with other metrics like mean MOI. 

      We agree that mean MOI by age presents a way forward with variable samples to scale up. Please see updated Figure supplement 8.  

      In sum, while the manuscript opens up an interesting discussion, I'm left with an incomplete understanding of the robustness and interpretability of the new proposed metric.”

      We thank you for your opinion. We have further edited the manuscript to make clear our choice of the term and the issue of sample size.  We believe the proposed terminology is meaningful as explained above.

      Reviewer #3 (Public review): 

      Summary

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions. 

      Strengths:

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population.

      Thank you to the reviewer for their supportive assessment of our research.

      Weaknesses

      None

      Reviewer #3 (Recommendations for the authors): 

      New figure supplement 8 - x-axis says percentage but goes between 0-1, so is a proportion

      We thank the reviewer for bringing this to our attention. We have amended the x-axis labels accordingly for Figure supplement 8.

    1. eLife Assessment

      This study presents fundamental new findings introducing a new approach for the reprogramming of brain glial cells to corticospinal neurons. The data is highly compelling, with multiple lines of evidence demonstrating the success of this new assay. These exciting findings set the stage for future studies of the potential of these reprogrammed cells to form functional connections in vivo and their utility in clinical conditions where corticospinal neurons are compromised.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ozcan et al., presents compelling evidence demonstrating the latent potential of glial precursors of the adult cerebral cortex for neuronal reprogramming. The findings substantially advance our understanding of the potential of endogenous cells in the adult brain to be reprogrammed. Moreover, they describe a molecular cocktail that directs reprogramming toward corticospinal neurons (CSN).

      Strengths:

      Experimentally, the work is compelling and beautifully designed. The work provides a characterization of endogenous progenitors, genetic strategies to isolate them, and proof of concept of exploiting these progenitors' potential to produce a specific desired neuronal type with "a la carte" combination of transcription factors.

      Weaknesses:

      This study demonstrates reprogramming in vitro. Future research will need to assess how these reprogrammed corticospinal neurons integrate and function under physiological conditions and in models of trauma or neurodegeneration.

      Although still in its early stages, neural reprogramming holds significant promise. This study reinforces the hope that, in the future, it may be possible to restore lost or damaged neurons through targeted cellular reprogramming.

    3. Reviewer #2 (Public review):

      Summary:

      Here the authors show a novel direct neuronal reprogramming model using a very pure culture system of oligodendrocyte progenitor cells and demonstrate hallmarks of corticospinal neurons to be induced when using Neurogenin2, a dominant-negative form of Olig2 in combination with the CSN master regulator Fezf2.

      Strengths:

      This is a major achievement as the specification of reprogrammed neurons towards adequate neuronal subtypes is crucial for repair and is still largely missing. The work is carefully done, and the comparison of the neurons induced only by Neurogenin 2 versus the NVOF cocktail is very interesting and convincingly demonstrates a further subtype specification by the cocktail.

      Weaknesses:

      As carefully as it is done in vitro, the identity of projection neurons can best be assessed in vivo. If this is not possible, it could be interesting to co-culture different brain regions and see if these neurons reprogrammed with the cocktail, indeed preferentially send out axons to innervate a co-cultured spinal cord versus other brain region tissue.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Ozcan et al., presents compelling evidence demonstrating the latent potential of glial precursors of the adult cerebral cortex for neuronal reprogramming. The findings substantially advance our understanding of the potential of endogenous cells in the adult brain to be reprogrammed. Moreover, they describe a molecular cocktail that directs reprogramming toward corticospinal neurons (CSN).

      Strengths:

      Experimentally, the work is compelling and beautifully designed, with no major caveats. The main conclusions are fully supported by the experiments. The work provides a characterization of endogenous progenitors, genetic strategies to isolate them, and proof of concept of exploiting these progenitors' potential to produce a specific desired neuronal type with "a la carte" combination of transcription factors.

      Weaknesses:

      Some issues need to be addressed or clarified before publication. The manuscript requires editing. It is dense and rich in details while in other parts there are a few mistakes.

      We thank the reviewer for their excellent summary and for their extremely positive review of our paper. We are pleased that the experimental design and conclusions were judged to be wellsupported.

      We have revised the paper to enhance clarity, include additional relevant citations, and refine terminology in some sections of the original version.

      We appreciate the reviewer’s thoughtful review and agree that these revisions enhance the paper.

      Reviewer #2 (Public Review):

      Summary:

      Here the authors show a novel direct neuronal reprogramming model using a very pure culture system of oligodendrocyte progenitor cells and demonstrate hallmarks of corticospinal neurons to be induced when using Neurogenin2, a dominant-negative form of Olig2 in combination with the CSN master regulator Fezf2.

      Strengths:

      This is a major achievement as the specification of reprogrammed neurons towards adequate neuronal subtypes is crucial for repair and still largely missing. The work is carefully done and the comparison of the neurons induced only by Neurogenin 2 versus the NVOF cocktail is very interesting and convincingly demonstrates a further subtype specification by the cocktail.

      Weaknesses:

      As carefully as it is done in vitro, the identity of projection neurons can best be assessed in vivo. If this is not possible, it could be interesting to co-culture different brain regions and see if these neurons reprogrammed with the cocktail, indeed preferentially send out axons to innervate a co-cultured spinal cord versus other brain region tissue.

      We appreciate the reviewer’s positive evaluation of our work and their recognition of its significance in advancing neuronal subtype specification through directed differentiation of endogenous progenitors. 

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue.

      We agree with the reviewer that future investigation in vivo will further strengthen the implications of this work.

      Reviewer #3 (Public Review):

      Summary:

      Ozkan, Padmanabhan, and colleagues aim to develop a lineage reprogramming strategy towards generating subcerebral projection neurons from endogenous glia with the specificity needed for disease modelling and brain repair. They set out by targeting specifically Sox6-positive NG2 glia. This choice is motivated by the authors' observation that the early postnatal forebrain of Sox6 knockout mice displays marked ectopic expression of the proneural transcription factor (TF) Neurog2, suggesting a latent neurogenic program may be derepressed in NG2 cells, which normally express Sox6. Cultured NG2 glia transfected with a construct ("NVOF") encoding Neurog2, the corticofugal neuron-specifying TF Fezf2, and a constitutive repressor form of Olig2 are efficiently reprogrammed to neurons. These acquire complex morphologies resembling those of mature endogenous neurons and are characterized by fewer abnormalities when compared to neurons induced by Neurog2 alone. NVOF-induced neurons, as a population, also express a narrower range of cortical neuron subtype-specific markers, suggesting narrowed subtype specification, a potential step forward for Neurog2-driven neuronal reprogramming. Comparison of NVOF- and Neurog2-induced neurons to endogenous subcerebral projection neurons (SCPN) also indicates Fezf2 may aid Neurog2 in directing the generation of SCPN-like neurons at the expense of other cortical neuronal subtypes.

      Strengths:

      The report describes a novel, highly homogeneous in vitro system amenable to efficient reprogramming. The authors provide evidence that Fezf2 shapes the outcome of Neurog2-driven reprogramming towards a subcerebral projection neuron identity, consistent with its known developmental roles. Also, the use of the modified RNA for transient expression of Neurog2 is very elegant.

      Weaknesses:

      The molecular characterization of NVOF-induced neurons is carried out at the bulk level, therefore not allowing to fully assess heterogeneity among NVOF-induced neurons. The suggestion of a latent neurogenic potential in postnatal cortical glia is only partially supported by the data from the Sox6 knockout. Finally, some of the many exciting implications of the study remain untested.

      Discussion:

      The study has many exciting implications that could be further tested. For example, an ultimate proof of the subcerebral projection neuron identity would be to graft NVOF cells into neonatal mice and study their projections. Another important implication is that Sox6-deficient NG2 glia may not only express Neurog2 but activate a more complete neurogenic programme, a possibility that remains untested here.

      Also, is the subcerebral projection neuron dependent on the starting cell population? Could other NG2 glia, not expressing Sox6, also be co-axed by the NVOF cocktail into subcerebral projection neurons? And if not, do they express other (Sox) transcription factors that render them more amenable to reprogramming into other cortical neuron subtypes? The authors state that SOX6-positive NG2 glia are a quiescent progenitor population. Given that NG2 glia is believed to undergo proliferation as a whole, are Sox6-positive NG2 glia an exception from this rule? Finally, the authors seem to imply that subcerebral projection neurons and Sox6-positive NG2 glia are lineage-related. However, direct evidence for this conjecture seems missing.

      We appreciate the reviewer’s thoughtful and detailed review of this work. We especially appreciate the positive evaluation of the work and the highlighting of multiple strengths of our approach, including the role of Fezf2 in refining neuronal subtype identity and the use of modified RNA to enable transient expression of Neurog2.

      We acknowledge the reviewer’s comment that single-cell transcriptomic analysis would indeed provide a more granular view of likely heterogeneity. This current study focuses on investigating the feasibility of directed differentiation of corticospinal-like neurons from endogenous progenitors. Future work employing single-cell sequencing could indeed help delineate the heterogeneity of neurons generated by directed differentiation, and potentially contribute toward identification of potential molecular roadblocks in different subsets.

      Regarding the suggestion that SOX6-deficient NG2+ progenitors might activate a broader neurogenic program, we agree that this is an intriguing possibility. We are currently conducting indepth investigation of the loss of SOX6 function in NG2+ progenitors, and we aim to submit this quite distinct work for separate publication.

      The reviewer raises an important point about whether SOX6+/NG2+ progenitors and subcerebral projection neurons are indeed normally lineage-related. In the current work, we utilized postnatal cortical SOX6+/NG2+ progenitors that are thought to be largely derived from EMX1+ and GSH2+ ventricular zone neural progenitors. Our unpublished data from the separate study noted above indicate that SOX6 is expressed by both these lineages in vivo. Since subcerebral projection neurons are derived from EMX1+ ventricular zone progenitors (SOX6-expressing), at least some of the SOX6+/NG2+ progenitors are expected to share a lineage relationship with subcerebral projection neurons. While our data strongly suggest such a link, we agree that direct lineagetracing could be pursued in future work. 

      Finally, we agree with the reviewer’s suggestion that in vivo transplantation to assess the identity and connectivity of neurons generated by directed differentiation would be very interesting, and is a natural next phase of this work. We aim to pursue such work in future investigations.

      We again thank the reviewer for their insightful comments.

      Reviewer #1 (Recommendations For The Authors): 

      The most important clarification for me concerns the initial description of the progenitors. I think there is a mistake with the transgenic line NG2. The dsRed mouse used in Figure 1 C is not described until later in the results describing Figure 2. This was confusing. Moreover, perhaps this is a reason why I get confused and do not understand how the authors conclude that SOX6+ cells are a subset of NG2positive cells. Panel C shows the opposite. Please correct the description and show the quantification of data in panel 1C.

      We thank the reviewer for their thoughtful review and for highlighting this important point. We appreciate the reviewer pointing out the benefit of further clarity regarding the NG2.DsRed transgenic mouse description in Figure 1C. We have revised the text to clarify the use of the transgenic line and ensure that the DsRed mouse is properly introduced. Additionally, we have further clarified the description explaining the basis for concluding that SOX6+ cells are a subset of NG2+ cells and further integrate this conclusion with the data presented.

      During cell sorting from the cortices of NG2.DsRed mice, we observe two distinct populations of NG2-DsRed+ cells based on fluorescence intensity in FACS: NG2-DsRed “bright” and NG2-DsRed “dim” populations. The NG2-DsRed “dim” population consists of a heterogenous mix of NESTIN+ progenitors, GFAP+ astrocytes/progenitors, a subset of NG2+ cells, and other unidentified cells. In contrast, the DsRed “bright” population includes a broader group of progenitors that also give rise to oligodendrocytes (please see Zhu, Bergles, and Nishiyama 2008), along with pericytes. 

      Previous studies have shown that, while dorsal/pallial VZ progenitors express SOX6 during embryonic development, SOX6 expression becomes restricted to interneurons postnatally (these do not express NG2 proteoglycan; Azim et al., 2009) and to the broader group of NG2+ progenitors that also give rise to oligodendrocytes. The ICC image in Fig. 1C shows bright NG2+ cells in the cortex, many of which express SOX6. Thus, we conclude that SOX6+ cells constitute a subset of NG2-DsRed+ cells. 

      In a similar line, the work is beautiful, but the manuscript can gain a lot from shortening and some more editing. for example:

      (1) In the abstract, the word inappropriate should be removed. It seems to me that is an unnecessary subjective qualification - it is hardly possible that in biology we found repression of something inappropriate.

      We have removed the word “inappropriate”.

      (2) FACS-purify these genetically accessible....establish a pure culture. Genetically accessible is nice, and I understand that it conveys that they can be traced in the mouse, but everything is genetically accessible with the right tool, and perhaps it is more informative to explain which gene or report is used for the isolation. These cells are not accessible in humans. Also, I consider it best to remove pure- the culture is pure (purified by FACS) cells.

      We have revised the text to specify the gene/reporter used for isolation instead of using "genetically accessible", and we removed "pure", since FACS purification is already explicitly mentioned.

      (3) In the initial paragraph in the results: "They are exposed to the same morphogen gradients throughout embryonic development, and thus, compared to distant cell types, have similar epigenomic and transcription landscapes." This is proven in the cited publication, but the way is stated here seems a bit of an unnecessary overstatement. The hypothesis stated after this paragraph is as good as it is with or without this argument.

      We have revised the text and simplified the statement. We agree that the hypothesis remains clear and well-supported without this emphasis.

      (4) In the result sections, "two distinct populations of DsREd-positive cells were identified based on fluorescence intensity"- I know it is correct, but when reading the percentages, I was confused because those percentages divided the population into three fractions. What the authors do not explain is that they discard the intermediate-expressing population.

      We appreciate the reviewer highlighting this inadvertent point of confusion. We erred by discussing only the two populations of central interest to us (DsRed-bright and DsRed-dim), and did not explicitly mention the DsRed-negative population. We have now clarified the text to include all three cell populations and their percentages of the total cells in all three populations (in the original manuscript and still now, ~75-78% were DsRed-negative). We have also further clarified that only DsRed-Bright cells (identified as progenitors) were used for all subsequent experiments.

      These examples illustrate the type of editing that would be appreciated but which is entirely up to the authors.

      We thank the reviewer for their thoughtful suggestions toward improving clarity and precision. We have incorporated these recommendations, along with suggestions from the other two reviewers, in the revised paper.

      Reviewer #2 (Recommendations For The Authors):

      (1)  The authors start their results section by showing in situ Hybridization for Ngn2 in control and Sox6KO mice. These control sections do not look convincing, as there is not even some signal in the adult VZSVZ region and virtually no background. Please show sections where some positive signal can also be detected in the control sections.

      We agree with the reviewer that making direct comparisons in ISH experiments is an important point. In our ISH experiments, to ensure consistency and appropriate comparisons, we process WT and KO sections together and stop the signal development simultaneously. We could have extended the development time to enhance WT signal to a detectable level, but that would have led to excessive background and over-saturated signal in the KO sections.

      To address the reviewer’s point, we have added a new supplementary figure with an additional pair of WT and KO sections, along with reference data from the Allen Brain Atlas. The WT section shows faint Neurog2 expression in the dentate gyrus region of the hippocampus, while the KO section confirms very substantial upregulation of Neurog2 in the absence of SOX6 function. These additional data enhance the clarity and depth of our results.

      Please see the following link for the Allen Brain Atlas ISH data demonstrating that Neurog2 expression in the postnatal (P4) SVZ/SGZ is inherently low. (https://developingmouse.brainmap.org/experiment/show/100093831). 

      (2) As a hallmark of projection neurons is where they send their axons, it would be important to include a biological assay for this. Of course, in vivo experiments would be great, but if this is not possible, the authors could co-culture sections from the late embryonic cortex, striatum, and spinal cord to see if the reprogrammed neurons preferentially extend their axons towards one of these targets (as normally developing neurons would, see e.g. Bolz et al., 1990).

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. We aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. As the reviewer insightfully suggests, co-culturing different brain regions with these neurons could offer an alternative strategy to partially assess potential preferential connectivity into cultured spinal cord vs. alternate tissue. This area of investigation is of substantial interest to our lab, and we aim to pursue it in the coming years– it is a very large undertaking by either approach.

      (3) However, if the loss of Sox6 is sufficient for Ngn2 to be upregulated, why did the authors not pursue this approach in their reprogramming experiments? Are these endogenous levels sufficient for reprogramming? Please add some OPC cultures from WT and KO mice to explore their conversion to neurons and possibly combine them with Olig2VP16 and Fezf2.

      We thank the reviewer for this insightful comment and for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. We are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months. Beyond that work, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is also of substantial interest to us, and we aim to pursue this in follow-up work. Though these are both interesting questions/topics, we respectfully submit that these broad areas of parallel, complex, and future investigation would substantially expand the scope of work in this paper, so we aim to address them in separate studies.

      (4) Please indicate independent biological replicates as individual data points in all histograms, i.e. also in Figure 2K, Figure 4I, S2H.

      We have updated the figure legends indicating the biological replicates, and explained the broad media optimization that was used successfully in all further experiments.

      (5) GFP labelling in Figures S2K-N is not convincing - too high background. Please optimize.

      We have redesigned this figure and now present it as a new supplementary figure, with GFP pseudocolored in gray and enlarged subpanels for improved visualization of cell morphology.

      Reviewer #3 (Recommendations For The Authors):

      This is an extremely well-written manuscript with very exciting implications. Obviously, not all can be tested here. Some of the suggestions are relatively easy and may be worth testing right away, others may require more extensive study in the future. In my view, completing some of the points below could make this paper a landmark study.

      I start with the key questions:

      (1) Do grafted NVOF cells give rise to subcerebral projection neurons in vivo?

      We agree with the reviewer’s suggestion that a very interesting future stage of this work would be to investigate the projection neuron identity including connectivity in vivo. As noted above in response to Reviewer 2, we aim to pursue follow-up studies to investigate in vivo integration and connectivity of such neurons generated by directed differentiation from endogenous SOX6+/NG2+ cortical progenitors. This question is of substantial interest to us, and we aim to pursue it in the coming years– as the reviewer notes, this is a very large undertaking, and beyond the scope of this paper.

      (2) What is the fate of the Sox6 deficient NG2 glia that express Neurog2? One could isolate these cells and subject them to scRNA sequencing to see how far neurogenesis proceeds without addition of exogenous factors.

      We thank the reviewer for this insightful question. As noted in our response to Reviewer 2, we are writing a separate manuscript regarding function of SOX6 in these progenitors during normal or molecularly manipulated development. We investigate function of SOX6 using both whole body null mice and a series of conditional null mice. We aim to post that work as a preprint and submit it for review and publication in the coming months, likely in early summer. We respectfully submit that this broad area of parallel, complex investigation would substantially expand the scope of work in this paper and make this paper too complex and multi-directional, so we aim to publish them as separate papers for the benefit of clarity for readers.

      (3) Obviously, what happens to Sox6-deficient (or non-deficient cells) when forced to express NVOF? In this context, it might be fair to cite Felske et al (PLoS Biol, 2023) who report Neurog2 and Fezf2-induced reprogramming in the postnatal brain. In their model, these authors did not distinguish between converted astrocytes and NG2 glia. Thus, some of the reprogrammed cells may comprise the SOX6positive cells described here.

      We thank the reviewer for highlighting for us that we inadvertently omitted referencing the important paper by Felske et al., 2023. We have now included this citation. 

      We thank the reviewer for raising this broader area of inquiry regarding whether SOX6 might be down-regulated to enhance induction of neurogenesis. Beyond the work noted above regarding function of SOX6 in these progenitors during normal or molecularly manipulated development, the potential strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to refine directed neuronal differentiation is of substantial interest to us, and we aim to pursue this in follow-up work. We again respectfully submit that this area of complex, future investigation should be addressed in future studies.

      Very interesting unaddressed questions include:

      (1) Are Sox6+ NG glia of dorsal origin? This is implied but not shown. One could use Emx1Cre lines to assess this. Are Sox6+ glia and subcerebral projection neurons clonally related? This may be more challenging. In this context, it might be again fair to refer to Herrero-Navarro et al (Science Advances 2021) who show that glia lineage related to nearby neurons gives rise to induced neurons with regional specificity.

      The reviewer raises an important question regarding the competence of SOX6+/NG2+ progenitors from distinct origins to generate corticospinal-like neurons by directed differentiation. In ongoing unpublished work, we have identified SOX6 expression by NG2+ progenitors of the three lineages derived from ventricular zone progenitors that express either Emx1, Gsh2, or Nkx2.1 transcription factors. The EMX1+ lineage-derived SOX6+/NG2+ progenitors are directly lineage related to cortical projection neurons. As the reviewer suggests, future experiments could explore potential differences in competence between these three populations.

      We again thank the reviewer for highlighting for us that we also inadvertently omitted referencing the exciting study by Herrero-Navarro that addresses the question of regional heterogeneity within astrocytes and the differential reprogramming potential related to their origins. We have now cited this paper in the manuscript.

      (2) Do other NG2 glia not give rise to subcerebral projection neurons when challenged with NVOF? Thus, how important is Sox6 expression really?

      The question of the specific competence of dorsal/cortical SOX6+/NG2+ progenitors to differentiate into corticospinal-like neurons, and the strategy of downregulating SOX6 function while simultaneously upregulating other molecular controls to direct neuronal differentiation, are both of great interest to us. In pilot experiments, we observed reduced competence of ventrallyderived SOX6+/NG2+ progenitors to generate similar neurons. We plan to pursue the SOX6 manipulation in follow up work.

      (3) Do Sox6+ NG2 glia proliferate like other NG2 glia and thereby represent a replenishable pool of progenitors?

      Yes; as noted in the text shortly after Figure 1, and as presented in Figure S3l-L, these progenitors proliferate robustly in response to the mitogens PDGF-A and FGF2.

      (4) How heterogenous are the NVOF-induced neurons? The bulk highlights the overall specificity, but does not tell whether all cells make it equally well.

      We agree with the reviewer that this is an interesting question. ICC analysis (Fig. 4G-4H) presents the variation in the levels of a few functionally important proteins in the population of NVOFinduced neurons. This could be due to any or all of at least three potential possibilities: 1) potential diversity in the population of purified SOX6+/NG2+ progenitors; 2) technical variability in the amount of NVOF plasmid delivered to individual progenitors during transfection; and/or 3) natural stochastic TF-level variations generating closely-related neuron types, that also occurs during normal development. Future experiments could explore these questions.

    1. eLife Assessment

      The authors use a range of techniques to examine the role of Aurora Kinase A (AurA) in trained immunity. The study is hypothesis driven, it uses solid experimental approaches, and the data are presented in a logical manner. The findings are valuable to the trained immunity field because they provide an in-depth look at a common inducer of trained immunity, beta-glucan.

    2. Reviewer #1 (Public review):

      In this updated and improved manuscript, the authors investigate the role of Aurora Kinase A (AurA) in trained immunity, following a broader drug screening aimed at finding inhibitors of training. They show AurA is important for trained immunity by looking at the different aspects and layers of training using broad omics screening, followed up by a more detailed investigation of specific mechanisms. The authors finalised the investigation with an in vivo MC-38 cancer model where AurA inhibition reduces beta-glucan's antitumour effects.

      Strengths:

      The experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results. Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      In response to the rebuttal, I would like to compliment and thank the authors for the large amount of work they have done to improve this manuscript. They have removed most of my previous concerns and confusions, and explained some of their approaches in a way that I now agree with them - a great learning opportunity for me as well.

      Weaknesses:

      (1) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (2) The authors have removed most of my concerns. Regarding the use of unpaired tests because that is what is often done in the literature: I still don't agree with this, nor do I think that 'common practice' is a solid argument to justify the approach. However, we can agree to disagree, as I know indeed that many people argue over when paired tests are appropriate in these types of experiments. I appreciate that n=2 for sequencing experiments is justifiable in the way these analyses are used as exploratory screening methods with later experimental validation. I also want to thank the authors for reporting biological replicates where relevant and (I should have mentioned this in my original review also) I appreciate they validate some findings in a separate cell line - many papers neglect this important step.

      (3) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (4) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (5) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (6) The authors have adequately responded to my comments and updated the manuscript accordingly. They have actually gone above and beyond.

      (7) I would like to thank the authors for highlighting this information and taking away my confusion. The authors have adequately responded to my comments and updated the manuscript accordingly.

      (8) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (9) I still think adding the 'alisertib alone' control would be of great added value, but I can see how it is unreasonable to ask the authors to redo those experiments.

      (10) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (11) The authors have adequately responded to my comments and updated the manuscript accordingly.

      (12) I thank the authors for their work to repeat this experiment with my suggestions included. I am convinced by this nice data. I would recommend that the authors put the data from New Figure 4 also in the manuscript as it adds value to the manuscript (unless I just missed it, I don't see it in Figure 6 or the supplement). Not every reader may look at the reviewer comments/rebuttal documents.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      Revision:

      The authors have satisfactorily addressed the majority of my concerns. In particular, the new bone marrow transplantation data convincingly demonstrate that Aurora A inhibition with Alisertib abolishes the β-glucan-trained antitumor effect-an essential finding supporting the manuscript's conclusions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer#1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:<br /> With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as:

      (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we have revised our statement to ensure accurate and updated description in manuscript. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation [1, 2]. We also detected Oxygen Consumption Rate (please see response to comment 8 of reviewer#1) but observed no obvious increase of oxygen consumption in trained BMDMs in our experiment setting. As the reviewer pointed out, it might be dependent on the dose of stimulation.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we have corrected the statement in our revised manuscript as “Although the concept of ‘trained immunity’ has been proposed since 2011, the detailed mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      We sincerely thank the reviewer for this thoughtful comment. (a) The data from animal experiments in which trained immunity was induced in vivo are presented as mean ± SD, while the statistical results from cell-based experiments are presented as mean ± SEM in the revised manuscript. (b) We have replaced one-tailed test with two-tailed test (see Figure 3J in revised manuscript, with updated P value label). We agree that cells derived from the same animal and subjected to different treatment conditions may be deemed paired data. We reanalyzed our data using paired statistical tests. While this led to a slight reduction in statistical significance for some comparisons, the overall trends remained consistent, and our biological interpretation remains unchanged. For in vitro experiments unpaired statistical tests are commonly used in literature [3, 4]. Thus, we still used unpaired test results here. (c) We have provided a detailed description of how multiple comparisons were performed in revised figure legends.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. For the in vivo experiment, we determined the sample size (n=5, n refers to number of mice used as biological replicates) by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies [5], n=5-7 is suitable for most of our mouse experiments. The in vitro cell assay was performed at least three independent experiments (BMs isolated from different mice), and each experiment was independently replicated at least three times and points represents biological replicates in our revised manuscript. In Figure 1A, 5 biological replicates of these experiments are presented to carefully determine a working concentration of alisertib that would not significantly affect the viability of trained macrophages, and that was subsequently used in all related cell-based experiments. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory/screening approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. For RNA-seq data analysis, we referred to the DESeq2 manual, which specifies that its statistical framework is based on the Negative Binomial Distribution and is capable of robustly inferring differential gene expression with a minimum of two replicates per group. Therefore, the inclusion of two replicates per group was deemed sufficient for our analysis. Nevertheless, the genomic and transcriptome sequencing data were used primarily for preliminary screening, where the candidates have been extensively validated through additional experiments. For example, we conducted ChIP followed by qPCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity-induced inflammatory genes.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      Thanks for your comments. In our initially submitted manuscript, some of the statistical results were presented as the representative data (technical replicates) from one of three independent biological replicates (including BMDMs experiments showing the suppression and rescue experiments of trained immunity under different inhibitors or activators, see original Figure 1B-C, Figure 5D, and Figure 5H, also related to Figure 1B-C, Figure 5D, and Figure 5H respectively in our revised manuscript) while other experimental data are biological replicates including CCK8 experiment, metabolic assay and ChIP-qPCR. In response to your valuable suggestion, we have revised the manuscript to present all statistical results as biological replicates from three independent experiments (presented as mean ± SEM), and we have provided all the original data for the statistical analysis results (please see Appendix 2 in resubmit system).

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we have briefly reported the outcomes of the entire drug screening in the revised manuscript. The targets of our epigenetic drug library are primarily categorized into several major classes, including Aurora kinase family, histone methyltransferase and demethylase (HMTs and KDMs), acetyltransferase and deacetylase (HDACs and SIRTs), JAK-STAT kinase family, AKT/mTOR/HIF, PARP family, and BRD family (see New Figure 1, related to Figure 1-figure supplement 1B in revised manuscript). Notably, previous studies have reported that inhibition of mTOR-HIF1α signaling axis suppressed trained immunity[6]. Our screening results also indicated that most inhibitors targeting mTOR-HIF1α signaling exhibit an inhibitory effect on trained immunity. Additionally, cyproheptadine, a specific inhibitor for SETD7, which was required for trained immunity as previously reported [7], was also identified in our screening.

      JAK-STAT signaling is closely linked to the interferon signaling pathway, and certain JAK kinase inhibitors also target SYK and TYK kinases. A previous drug library screening study has reported that SYK inhibitors suppressed trained immunity [8]. Consistently, our screening results reveal that most JAK kinase inhibitors exhibit suppressive effects on trained immunity.

      BRD (Bromodomain) and Aurora are well-established kinase families in the field of oncology. Compared to BRD, the clinical applications of the Aurora kinase inhibitor are still at early stage. In previous studies using inflammatory arthritis models where trained immunity was established, both adaptive and innate immune cells exhibited upregulated expression of AurA [9, 10]. Our study provides further evidence supporting an essential role of AurA in trained immunity, showing that AurA inhibition leads to the suppression of trained immunity.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in original manuscript (related to Figure 1-supplement 1C). We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2 and 1 μM respectively. It was just in this order, but not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates for initial screening purpose, and we need to verify any hit in the following experiment individually. Yes, we observed that lower concentration works better in some cases. We speculate that it might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range. But in our primary screening, we simply choose one concentration for all the drugs. This is a limitation for our screening, and we acknowledge this limitation in our discussion part.

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for this omission. The mRNA expression of Il6 and Tnf in trained BMDMs was analyzed by a quantitative real-time PCR via a DDCt method, and the result was normalized to untrained BMDMs with Actb (β-actin) as a reference gene, a well-documented gene with stable expression in macrophages. We have supplemented the description for measuring gene expression in Material and Methods in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      We are very sorry for this omission. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did not work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody and used β-tubulin as loading control for total AurA (please see New Figure 2A, also related to original Figure 1D). We have provided the source data for β-tubulin from the same membrane of total AurA (please see Figure 1-source data). To avoid any potential misleading, we have repeated this experiment and updated this Figure (please see New Figure 2B, also related to Figure 1D in revised manuscript) with phospho-AurA, total AurA and β-actin from the same gel. The bands for phospho AurA (T288) were obtained using a new antibody (Invitrogen, 44-1210G) and we have revised this information in Material and Methods. We have provided data of three biological replicates to confirm the experiment result also see New Figure 2B, related to Figure 1D in revised manuscript, and the raw data have been added in source data for Figure 1)

      (7) Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. Figure 2 (see also Figure 2 in revised manuscript) presented information on the chromatin landscape affected by AurA inhibition to confirm that AurA inhibition impaired key gene activation involved in pro-inflammatory macrophage activation by β-glucan. In Figure 2B we highlighted a few classical GO terms downregulated including “regulation of growth”, “myeloid leukocyte activation” and “MAPK cascade” (see also Figure 2B in revised manuscript), among which “regulation of growth” is known function of Aurora A, just to show that alisertib indeed inhibited Aurora A function in vivo as expected. “Myeloid leukocyte activation” and “MAPK cascade” were to show the impaired pro-inflammatory gene accessibility. We highlighted KEGG terms downregulated like “JAK-STAT signaling pathway”, “TNF signaling pathway” and “NF-kappa B signaling pathway” in Figure 2F (see also Figure 2F in revised manuscript), as these pathways are highly relevant to trained immunity. Meanwhile, KEGG terms “FOXO signaling pathway” (see also Figure 2G in revised manuscript) was highlighted to confirm the anti-inflammation effect of alisertib in trained BMDMs, which was further illustrated in Figure 5 (see also Figure 5 in revised manuscript, illustrating FOXO3 acts downstream of AurA). Some top hits in Figure 2B like “positive regulation of cell adhesion”, and “pathway of neurodegeneration” and "ubiquitin mediated proteolysis" in Figure 2F and 2G, is not directly related to trained immunity, thus we did not highlight them, but may provide some potential information for future investigation on other functions of Aurora A.

      (8) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure oxygen consumption rate (OCR) in β-glucan-trained BMDMs. The results showed no obvious increase in oxidative phosphorylation (OXPHOS) indicated by OCR under β-glucan stimulation (related to Figure 3-figure supplement 1 A) although the carbon tracing experiments showed more glucose-carbon going into TCA cycle. We speculate that the observed discrepancy between increased glucose incorporation into TCA cycle and unchanged OXPHOS may reflect a characteristic metabolic reprogramming induced by trained immunity. The increased incorporation of glucose-derived carbon into the TCA cycle likely serves a biosynthetic purpose—supplying intermediates for anabolic processes—rather than augmenting mitochondrial respiration[6]. Moreover, the unchanged OXPHOS may be attributed to a reduced reliance on fatty acid oxidation- “catabolism”, with glucose-derived acetyl-CoA becoming the predominant substrate. Thus, while overall OXPHOS remains stable, the glucose contribution to the TCA cycle increases. This is in line with reports showing that trained immunity promotes fatty acid synthesis- “anabolism”[11]. Alternatively, the partial decoupling of the TCA cycle from OXPHOS could result from the diversion of intermediates such as fumarate out of the cycle. Oxygen consumption rate (OCR) after a mito stress test upon sequential addition of oligomycin (Oligo, 1 μM), FCCP (1 mM), and Rotenone/antimycin (R/A, 0.5 μM), in BMDMs with different treatment for 24 h. β-glucan, 50 μg/mL; alisertib, 1 μM.

      (9) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may further solidify the results. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not include the group of alisertib only without β-glucan stimulation.

      (10) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots in original data submitted. We went through the assembling process and figured out that the orientation of blots in original data was assembled according to the loading sequences, but not saved correctly, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for this careless mistake, and we have double checked to make sure all the blots are correctly assembled in the revised manuscript. We also provided three replicates of for the Western blot results showing the level of H3K36me3 in trained BMDMs was inhibited by alisertib (as seen in New Figure 7 at recommendation 2 of reviewer#2).

      (11) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we have reorganized our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (12) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. In Figure 6, we performed assay in mouse tumor model and found that trained immunity upregulated cytokines level like IL-6 in tumor tissue, which was downregulated by alisertib administration. In order to rule out the possibility that the detected cytokines such as IL-6 was from tumor cells, we performed intracellular cytokine staining of single cells isolated from tumor tissues (please see New Figure 4). The result showed that only a small fraction of non-immune cells (CD45<sup>-</sup> population) expressed IL-6 (0.37% ± 0.11%), whereas a significantly higher proportion of IL-6-positive cells was observed among CD45<sup>+</sup> population (deemed as immune cells, 13.66% ± 1.82%), myeloid cells (CD45<sup>+</sup>CD11b<sup>+</sup>, 15.60% ± 2.19%), and in particular, macrophages (CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>+</sup>37.24% ± 3.04%). These findings strongly suggest that immune cells, especially macrophages, are the predominant source of IL-6 cytokine within the tumor microenvironment. Moreover, we also detected higher IL-6 positive population in myeloid cells and macrophages (please see Figure 6I in revised manuscript).

      Reviewer#2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as a methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism was not formally tested previously. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, we agree that alisertib may inhibits Aurora A in many cell types besides myeloid cells. To further address the reviewer’s concern, we have performed the suggested bone marrow transplantation experiment (trained mice as donor and naïve mice as recipient) to verify the contribution of myeloid cell-mediated trained immunity for antitumor effect (please see New Figure 8, also related to Figure 6C, 6D and Figure 6-figure supplement 1B and 1C in revised manuscript).

      Reviewer #1 (Recommendations for the authors):

      Some examples of spelling errors and other mistakes (by far not a complete list):

      (a) Introduction, second sentence: reads as if Candida albicans (which should be italicised and capitalised properly) and BCG are microbial polysaccharide components.

      (b) Methods: ECAR is ExtraCellular Acidification Rate, not 'Extracellular Acid Ratio'

      (c) Figure 2C: β-glucan is misspelled in the graph title.

      (d) TNFα has been renamed to 'TNF' for a long time now.

      (e) Inconsistent use of Tnf and Tfnα (the correct gene symbol is Tnf) (NB: this field does not allow me to italicise gene symbols)

      (f) Figure supplement 1B: 'secdonary'

      (g) Caption of figure 4: "Turkey's multiple-comparison test"

      (h) etc

      I would ask the authors that they please go over the entire manuscript very carefully to correct such errors.

      We apologize for these errors and careless mistakes. We greatly appreciate your suggestions, and have carefully proofread the revised manuscript to make sure no further mistakes.

      Please also address the points I raised in the public review about statistical approaches. Even more important than the relatively low 'n' is my question about biological replicates. Please clarify what you mean by 'biological replicate'.If you are able to repeat at least the in vitro experiments (if this is too much work pick the most important ones) a few more times this would really strengthen the results.

      Thank you for your comment. Our biological replicates refer to independently repeated experiments using bone marrow cells isolated from different mice, and n represents the number of mice used. We repeated each experiment at least three times using BMDMs isolated from different mice (n =3, biological replicates). Specifically, we repeated several in vitro experiments showing inhibition of AurA upregulated GNMT in trained BMDMs and showing transcription factor FOXO3 acted as a key protein in AurA-mediated GNMT expression to control trained immunity as well as showing mTOR agonist rescued trained immunity inhibited by alisertib (see New Figure 5, related to Figure 5B-C, Figure 5H in revised manuscript). Additionally, we have provided data with three biological replicates to show the β-glucan induced phosphorylation of AurA (see comment 6 of reviewer#1) and changes of histone modification marker under AurA inhibition and GNMT deficiency (see recommendation 2 of reviewer#2). We also repeated in vivo tumor model to analysis intratumor cytokines (see recommendation 12 of reviewer#1).

      Finally: the authors report 'no funders' during submission, but the manuscript contains funding details. Please modify this in the eLife submission system if possible.

      Thank you for your kind reminder and we have modified funding information in the submission system.

      Reviewer #2 (Recommendations for the authors):

      (1) I have the following methodological and interpretative comments for consideration:

      Aurora A has been previously implicated in M1 macrophage differentiation and NF-κB signaling. What is the effect of Aurora A inhibition on basal LPS stimulation? Considering that β-glucan + Ali also skews macrophage priming towards an M2 phenotype, as shown in Fig. 2E, further clarification on this point would strengthen the study.

      Thanks for your suggestion. Previous study showed AurA was upregulated in LPS-stimulated macrophages and the inhibition of AurA downregulated M1 markers of LPS-stimulated macrophages through NF-κB pathway but did not affect IL-4-induced M2 macrophage polarization [12]. Consistently, we also found that AurA inhibition downregulated inflammatory response upon basal LPS stimulation as shown by decreased IL-6 level (see New Figure 6). In original Figure 2E (also related to Figure 2E in revised manuscript), we showed an increased accessibility of Mrc1 and Chil3 under “β-glucan +Ali” before re-challenge, both of which are typical M2 macrophage markers. Motif analysis showed that AurA inhibition would upregulate genes controlled by PPARγ (STAT6 was not predicted). Different from STAT6, a classical transcriptional factor in controlling M2 polarization (M2a) dependent on IL-4 or IL-13, PPARγ mediates M2 polarization toward M2c and mainly controls cellular metabolism on anti-inflammation independent on IL-4 or IL-13. Thus, we speculate that inhibition of AurA might promote non-classical M2 polarization, and the details warrant future investigation.

      (2) In Figure 4A, it looks like that H3K27me3 is also significantly upregulated by β-glucan and inhibited by Ali. How many biological replicates were performed for these experiments? It would be beneficial to include densitometric analyses to visualize differences across multiple Western blot experiments for better reproducibility and quantitative assessment. In addition, what is the effect of treatment of Ali alone on the epigenetic profiling of macrophages?

      We are sorry for this confusion. Each experiment was performed with at least three independent biological replicates. In original Figure 4-figure supplement 1 (also related to Figure 4-figure supplementary 1 in the revised manuscript), we presented the densitometric analysis results from three independent Western blot experiments, which showed that β-glucan did not affect H3K27me3 levels under our experimental conditions. Three biological replicates data for histone modification were shown as follows (New Figure 7, as related to Figure 4-figure supplement 1 in revised manuscript). We appreciate that assay for “Ali alone” in macrophages may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity, and we know that alisertib itself would not induce or suppress trained immunity. Therefore, in most settings, we did not test the effect of Alisertib alone without β-glucan stimulation.

      (3) The IL-6 and TNF concentrations exhibit considerable variability (Fig. 3K and Fig. 5H), ranging from below 10 pg/mL to 500-1000 pg/mL. Please specify the number of replicates for these experiments and provide more detail on how variability was managed. Including this information would enhance the robustness of the conclusions.

      Thank you for your comment. These experiments were replicated as least three times using BMDMs isolated from different mice. The observed variations in cytokines concentration may be attributed to factors such as differences in cell density, variability among individual mice, and the passage number of the MC38 cells used for supernatant collection. We have prepared new batch of BMDMs and repeated the experiment and provided consistent results in the revised manuscript (please see Figure 5H in revised manuscript). Data for biological replicates have been provided (please see Appendix 2 in resubmit system).

      (4) The impact of Aurora A inhibition on β-glucan-induced anti-tumor responses appears complex. Specifically, GNMT expression is significantly upregulated in F4/80- cells, with stronger effects compared to F4/80+ cells as seen in Fig. 6D. To discern whether this is due to the abolishment of trained immunity in myeloid cells or an effect of Ali on tumor cells which inhibit tumor growth, I suggest performing bone marrow transplantation. Transplant naïve or trained donor BM into naïve recipients, followed by MC38 tumor transplantation, to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Thanks for your valuable suggestion. Following your suggestion, we have performed bone marrow transplantation to clarify that alisertib acts on the BM cells to inhibit anti-tumor effect induced by trained immunity (see New Figure 8, related to Figure 6C-D in revised manuscript). As the results shown below, transplantation of trained BM cells conferred antitumor activity in recipient mice, while transplantation of trained BM cells with alisertib treatment lost such activity, further demonstrating that alisertib inhibited AurA in trained BM cells to impair their antitumor activity.

      References

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Cui, L., et al., N(6)-methyladenosine modification-tuned lipid metabolism controls skin immune homeostasis via regulating neutrophil chemotaxis. Sci Adv, 2024. 10(40): p. eadp5332.

      (4) Yu, W., et al., One-Carbon Metabolism Supports S-Adenosylmethionine and Histone Methylation to Drive Inflammatory Macrophages. Mol Cell, 2019. 75(6): p. 1147-1160 e5.

      (5) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (6) Cheng, S.C., et al., mTOR- and HIF-1α-mediated aerobic glycolysis as metabolic basis for trained immunity. Science, 2014. 345(6204): p. 1250684.

      (7) Keating, S.T., et al., The Set7 Lysine Methyltransferase Regulates Plasticity in Oxidative Phosphorylation Necessary for Trained Immunity Induced by β-Glucan. Cell Rep, 2020. 31(3): p. 107548.

      (8) John, S.P., et al., Small-molecule screening identifies Syk kinase inhibition and rutaecarpine as modulators of macrophage training and SARS-CoV-2 infection. Cell Rep, 2022. 41(1): p. 111441.

      (9) Glant, T.T., et al., Differentially expressed epigenome modifiers, including aurora kinases A and B, in immune cells in rheumatoid arthritis in humans and mouse models. Arthritis Rheum, 2013. 65(7): p. 1725-35.

      (10) Jeljeli, M.M. and I.E. Adamopoulos, Innate immune memory in inflammatory arthritis. Nat Rev Rheumatol, 2023. 19(10): p. 627-639

      (11) Ferreira, A.V., et al., Fatty acid desaturation and lipoxygenase pathways support trained immunity. Nat Commun, 2023. 14(1): p. 7385.

      (12) Ding, L., et al., Aurora kinase a regulates m1 macrophage polarization and plays a role in experimental autoimmune encephalomyelitis. Inflammation, 2015. 38(2): p. 800-11.

    1. eLife Assessment

      This manuscript reports a large series of experiments to investigate specific aspects of plant adaptation, leveraging genetic and genomic resources of Arabidopsis thaliana. The study provides convincing evidence for local adaptation in this highly selfing plant. This is an important dataset contributing to the developing understanding of non-linear selection in plants and beyond.

    2. Reviewer #1 (Public review):

      Summary:

      As a general phenomenon, adaptation of populations to their respective local conditions is well-documented, though not universally. In particular, local adaptation has been amply demonstrated in Arabidopsis thaliana, the focal species of this research, which is naturally highly selfing. Here, the authors report assays designed to evaluate the spatial scale of fitness variation among source populations and sites, as well as temporal variability in fitness expression. Further, they endeavor to identify traits and genomic regions that contribute to the demonstrated variation in fitness.

      Strengths:

      With many (200) inbred accessions drawn from throughout Sweden, the study offers an unusually fine sampling of genetic variation within this much-studied species, and through assays in multiple sites and years, it amply demonstrates the context-dependence of fitness expression. It supports the general phenomenon of local adaptation, with multiple nuances. Other examples exist, but it is of value to have further cases illustrating not only the context-dependence of fitness expression but also the sometimes idiosyncratic nature of fitness variation. I commend the authors on their cautionary language in relation to inferences about the roles of particular genomic regions (e.g.l.140-144; l.227)

      Weaknesses:

      To my mind, the manuscript is written primarily for the Arabidopsis community. This community is certainly large, but there are many evolutionary biologists who could appreciate this work but are not invited to do so. The authors could address the broader evolution community by acknowledging more of the relevant work of others (I've noted a few references in my comments to the authors). At least as important, the authors could make clearer the fact that A. thaliana is (almost) strictly selfing and how this feature of its biology both enables such a study and also limits inferences from it. Further, it seems to me that though I could be wrong, readers would appreciate a more direct, less discursive style of writing, and one that makes the broader import of the focal questions clearer.

      As a reader, I would value seeing estimates of the overall fitness of the accessions in the different conditions, i.e., by combining the survival and fecundity results of the common garden experiments.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this study was to find evidence for local adaptation in survival and fecundity of the model plant Arabidopsis thaliana. The authors grew a large set of Swedish Arabidopsis accessions at four common garden sites in northern and southern Sweden. Accessions were grown from seed in trays, which were laid on the ground at each site in late summer, screened for survival in fall and the following spring, and fecundity was determined from rosette size and seed production in spring. Experiments were complemented by 'selection experiments', in which seeds of the same accessions were sown in plots, and after two years of growth, plants were sampled to determine fitness from genotype frequencies, providing a more comprehensive evaluation of lifetime fitness than can be gleaned from fecundity alone.

      As the main result, southern accessions had higher mortality in northern sites in one of two years, but also suffered more slug damage in southern sites in one year, indicating a potential link between frost tolerance and herbivore resistance. Fecundity of accession was highest when growing close to the 'home' environment, but while accessions from one sand dune population in southern Sweden had among the lowest fecundities overall, they consistently had the highest fitness in the selection experiment. Accessions from this population had large seed size and rapid root growth, which might be related to establishment success when arriving in a new, partially occupied habitat. However, neither trait could fully explain the very high fitness of this population, suggesting the presence of other, unmeasured traits.

      Overall, the authors could provide clear evidence of local adaptation in different traits for some of their experiments, but they also highlight high temporal and spatial variability that makes prediction of microevolutionary change so challenging.

      Strengths:

      A major strength of this study is the highly comprehensive evaluation of different fitness-related traits of Arabidopsis under natural conditions. The evaluation of survival and fecundity in common garden experiments across four sites and two years provides an estimate of variability and consistency of results. The addition of the 'selection experiment' provides an extended view on plant fitness that is both original and interesting, in particular highlighting potential limitations of 'fitness-proxies' such as seed production that don't take into account seedling establishment and competitive exclusion.

      Throughout the study, the authors have gone to impressive depths in exploring their data, and particularly the discovery of 'native volunteers' in selection experiment plots and their statistical treatment is very elegant and has resulted in compelling conclusions. Also, while the authors are careful in the interpretation of their GWAS results, they nonetheless highlight a few interesting gene candidates that may be underlying the observed plant adaptations, and which likely will stimulate further research.

      Overall, the authors provide a rich new resource that is relevant and interesting both in the context of general evolutionary theory as well as more specifically for molecular biology.

      Weaknesses:

      While the repetition of the common garden experiments over two years is certainly better than no repetition (hence its mention also under 'strengths'), the very high variability found between the two years highlights the need for more extensive temporal replication. In this context, two temporal replicates are the bare minimum, and more repeats in time would be necessary to draw any kind of conclusion about the role of 'high mortality' and 'low mortality' years for the microevolution of Arabidopsis. It also seems that the authors missed an opportunity to explore potentially causal variation among years, as they did not attempt to relate winter mortality to actual climatic variables, even though they discuss winter harshness as a potential predictor.

      The low temporal variation also makes the accidental slug herbivory appear somewhat random. Potted plants are notoriously susceptible to slug herbivory, and while it is certainly nice that slug damage predominantly affected one group of accessions, it nonetheless raises the question whether this reflects a 'real' selection pressure that plants commonly face in their respective local environments.

      The addition of the 'selection experiment' is certainly original and provides valuable additional insights, but again, it seems a bit questionable which natural process really has affected this outcome. While the genetic and statistical analysis of this experiment seems to be state-of-the-art, the experimental design is rather rudimentary compared to more standard selection experiments. Specifically, the authors added seeds from greenhouse-grown mothers to experimental plots and only sampled plants two years later. This means that, potentially,y the first very big bottleneck was germination under natural conditions, which may have already excluded many of the accessions before they had a chance to grow. While this certainly is one type of selection, it is not exactly the type of selection that a 2-year selection experiment is set up to measure. Either initially establishing the selection experiment from plants instead of seeds, or genotyping the population over several generations, would have substantially strengthened the conclusions that could be drawn from this experiment. Also, the complete lack of information on population density is a bit problematic. It is not clear if there were other (non-Arabidopsis) plants present in the plots, how many Arabidopsis plants were established, if numbers changed over the year, etc. Given all of these limitations, calling this a 'selection experiment' is in fact somewhat misleading.

      Despite these weaknesses, the authors could achieve their main goals, and despite the somewhat minimal temporal replication, they were lucky to sample two fairly distinct years that provided them with interesting variation, which they could partially explain using the variation among their accessions. Overall, this study will likely make an important contribution to the field of evolutionary biology, and it is another very strong example of how the extensive molecular tools in Arabidopsis can be leveraged to address fundamental questions in evolution and ecology, to an extent that is not (yet) possible in other plant systems.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents a large common garden experiment across Sweden using solely local germplasm. Additionally, there is a collection of selection experiments that begin investigating the factors shaping fecundity in these populations. This provides an impressive amount of data and analysis investigating the underlying factors involved. Together, this helps support the data showing that fluctuations and interactions are key components determining Arabidopsis fitness and are more broadly applicable across plant and non-plant species.

      Strengths:

      The field trials are well conducted with extensive effort and sampling. Similarly while the genetic analysis is complex it is well conducted and reflects the complexity of dealing with population structure that may be intricately linked to adaptive structure. This has no real solution and the option of presenting results with and without correction is likely the only appropriate option.

      Weaknesses:

      A significant finding from this study was that fecundity is shaped more by yearly fluctuations and their interaction with genotype than it is by the main effect of location or genotype. Another significant finding is that the strength of selection can be quite strong, with nearly 5x ranges across accessions. It should be noted that there are a number of other studies using Arabidopsis in the wild with multiple years and locations that found similar observations beyond the Oakley citation. In general, the context of how these findings relate to existing knowledge in Arabidopsis is a bit underdeveloped.

      The effects of the populations across the locations seem to rely on individual tests and PC analysis. It would seem to be possible to incorporate these tests more directly in the linear modeling analysis, and it isn't quite clear why this wasn't conducted.

      I'm a bit puzzled by the discussion on how to find causative loci. This seems to focus solely on GWAS as the solution, with a goal to sequence vast individuals. But the loci that the manuscript discussed were found by a combination of structured mapping populations followed by molecular validation that then informed the GWAS. As such, I'm unsure if the proposed future approach of more sequencing is the best when a more balanced approach integrating diverse methods and population types will be more useful.

    5. Author response:

      Reviewer #1 (Public review):

      Summary: 

      As a general phenomenon, adaptation of populations to their respective local conditions is well-documented, though not universally. In particular, local adaptation has been amply demonstrated in Arabidopsis thaliana, the focal species of this research, which is naturally highly selfing. Here, the authors report assays designed to evaluate the spatial scale of fitness variation among source populations and sites, as well as temporal variability in fitness expression. Further, they endeavor to identify traits and genomic regions that contribute to the demonstrated variation in fitness.  

      Strengths: 

      With many (200) inbred accessions drawn from throughout Sweden, the study offers an unusually fine sampling of genetic variation within this much-studied species, and through assays in multiple sites and years, it amply demonstrates the context-dependence of fitness expression. It supports the general phenomenon of local adaptation, with multiple nuances. Other examples exist, but it is of value to have further cases illustrating not only the context-dependence of fitness expression but also the sometimes idiosyncratic nature of fitness variation. I commend the authors on their cautionary language in relation to inferences about the roles of particular genomic regions (e.g.l.140-144; l.227)  

      Weaknesses: 

      To my mind, the manuscript is written primarily for the Arabidopsis community. This community is certainly large, but there are many evolutionary biologists who could appreciate this work but are not invited to do so. The authors could address the broader evolution community by acknowledging more of the relevant work of others (I've noted a few references in my comments to the authors). At least as important, the authors could make clearer the fact that A. thaliana is (almost) strictly selfing and how this feature of its biology both enables such a study and also limits inferences from it. Further, it seems to me that though I could be wrong, readers would appreciate a more direct, less discursive style of writing, and one that makes the broader import of the focal questions clearer. 

      we agree that connecting the paper better to the broader field is desirable, and will try to do this in the revision. As for how selfing matters, there certainly are some things we can discuss, but a general discussion is probably a suitable topic for a review/opinion article!

      As a reader, I would value seeing estimates of the overall fitness of the accessions in the different conditions, i.e., by combining the survival and fecundity results of the common garden experiments.

      Combining estimates would be possible in the common garden experiments, and would bring us somewhat closer to total fitness estimates, although as noted by another reviewer (and also emphasized by us), the time scale of our experiment is not sufficient to evaluate the trade-off between survival and fecundity. Furthermore, we would still be missing the establishment component of fitness, which we found to be extremely important. Therefore little would be gained by combining the estimates, while at the same time losing resolution to disentangle the fitness components. We thus decided to focus on the individual fitness components and leave consideration of their joint effect for the Discussion.

      Reviewer #2 (Public review):

      Summary: 

      The goal of this study was to find evidence for local adaptation in survival and fecundity of the model plant Arabidopsis thaliana. The authors grew a large set of Swedish Arabidopsis accessions at four common garden sites in northern and southern Sweden. Accessions were grown from seed in trays, which were laid on the ground at each site in late summer, screened for survival in fall and the following spring, and fecundity was determined from rosette size and seed production in spring. Experiments were complemented by 'selection experiments', in which seeds of the same accessions were sown in plots, and after two years of growth, plants were sampled to determine fitness from genotype frequencies, providing a more comprehensive evaluation of lifetime fitness than can be gleaned from fecundity alone. 

      To clarify, fecundity was determined from total plant area using photos of the mature stems, not the rosettes or direct counting of seeds. That said, it is true that our fecundity estimate was well correlated with rosette area. Furthermore, we validate our fecundity estimates by showing they were highly correlated with seed production estimated by measuring and counting siliques on a separate set of plants grown under common garden conditions in one of our sites (Brachi et al.2022). 

      As the main result, southern accessions had higher mortality in northern sites in one of two years, but also suffered more slug damage in southern sites in one year, indicating a potential link between frost tolerance and herbivore resistance. Fecundity of accession was highest when growing close to the 'home' environment, but while accessions from one sand dune population in southern Sweden had among the lowest fecundities overall, they consistently had the highest fitness in the selection experiment. Accessions from this population had large seed size and rapid root growth, which might be related to establishment success when arriving in a new, partially occupied habitat. However, neither trait could fully explain the very high fitness of this population, suggesting the presence of other, unmeasured traits. 

      Overall, the authors could provide clear evidence of local adaptation in different traits for some of their experiments, but they also highlight high temporal and spatial variability that makes prediction of microevolutionary change so challenging. 

      Strengths: 

      A major strength of this study is the highly comprehensive evaluation of different fitness-related traits of Arabidopsis under natural conditions. The evaluation of survival and fecundity in common garden experiments across four sites and two years provides an estimate of variability and consistency of results. The addition of the 'selection experiment' provides an extended view on plant fitness that is both original and interesting, in particular highlighting potential limitations of 'fitness-proxies' such as seed production that don't take into account seedling establishment and competitive exclusion. 

      Throughout the study, the authors have gone to impressive depths in exploring their data, and particularly the discovery of 'native volunteers' in selection experiment plots and their statistical treatment is very elegant and has resulted in compelling conclusions. Also, while the authors are careful in the interpretation of their GWAS results, they nonetheless highlight a few interesting gene candidates that may be underlying the observed plant adaptations, and which likely will stimulate further research. 

      Overall, the authors provide a rich new resource that is relevant and interesting both in the context of general evolutionary theory as well as more specifically for molecular biology. 

      Weaknesses:

      While the repetition of the common garden experiments over two years is certainly better than no repetition (hence its mention also under 'strengths'), the very high variability found between the two years highlights the need for more extensive temporal replication. In this context, two temporal replicates are the bare minimum, and more repeats in time would be necessary to draw any kind of conclusion about the role of 'high mortality' and 'low mortality' years for the microevolution of Arabidopsis. It also seems that the authors missed an opportunity to explore potentially causal variation among years, as they did not attempt to relate winter mortality to actual climatic variables, even though they discuss winter harshness as a potential predictor.

      We agree that two years is insufficient to understand how variation in selective pressures compound over time to generate micro-evolutionary change. The eight-year data in Oakley et al. (2023), which we discuss in the paper, support this. Our results are nonetheless sufficient to demonstrate the idiosyncratic nature of selection. In the revision, we will further emphasize that far longer time series would be needed for definitive conclusions.

      Our short time series is also why we do not try to correlate with climate data, as this would amount to doing statistics with four data points (mostly two groups of accession N vs S, with mostly homogenous climates within groups, and two years).

      The low temporal variation also makes the accidental slug herbivory appear somewhat random. Potted plants are notoriously susceptible to slug herbivory, and while it is certainly nice that slug damage predominantly affected one group of accessions, it nonetheless raises the question whether this reflects a 'real' selection pressure that plants commonly face in their respective local environments. 

      We agree with this point as well. The evidence for selection on glucosinolates by generalist herbivores such as slugs is fairly strong, but the precise agent is not known, and probably varies over time and space. Our results merely demonstrate one possibility (and we will clarify this in the revision).

      The addition of the 'selection experiment' is certainly original and provides valuable additional insights, but again, it seems a bit questionable which natural process really has affected this outcome. While the genetic and statistical analysis of this experiment seems to be state-of-the-art, the experimental design is rather rudimentary compared to more standard selection experiments. Specifically, the authors added seeds from greenhouse-grown mothers to experimental plots and only sampled plants two years later. This means that, potentially,y the first very big bottleneck was germination under natural conditions, which may have already excluded many of the accessions before they had a chance to grow. While this certainly is one type of selection, it is not exactly the type of selection that a 2-year selection experiment is set up to measure. Either initially establishing the selection experiment from plants instead of seeds, or genotyping the population over several generations, would have substantially strengthened the conclusions that could be drawn from this experiment.

      We agree that more data would have been beneficial, and we do not make strong claims about the nature of selection. Among other phenotypes, we mention dormancy, and note that existing dormancy estimates do not predict fitness in our selection experiments. In addition the same seed batches germinated uniformly in the common-garden experiments with minimal stratification (we will note this in the revision).

      Also, the complete lack of information on population density is a bit problematic. It is not clear if there were other (non-Arabidopsis) plants present in the plots, how many Arabidopsis plants were established, if numbers changed over the year, etc. Given all of these limitations, calling this a 'selection experiment' is in fact somewhat misleading. 

      Seeds were introduced into sites that appeared appropriate for A. thaliana, leaving the background community intact. We provided information on sowing density; the density of plants (A. thaliana and other species) that we obtained during the course of the experiments varied considerably between sites, much like in natural populations, although we lack systematic measurements. We will provide more information (including photos) in the revision.  

      Despite these weaknesses, the authors could achieve their main goals, and despite the somewhat minimal temporal replication, they were lucky to sample two fairly distinct years that provided them with interesting variation, which they could partially explain using the variation among their accessions. Overall, this study will likely make an important contribution to the field of evolutionary biology, and it is another very strong example of how the extensive molecular tools in Arabidopsis can be leveraged to address fundamental questions in evolution and ecology, to an extent that is not (yet) possible in other plant systems. 

      Reviewer #3 (Public review)

      Summary: 

      The manuscript presents a large common garden experiment across Sweden using solely local germplasm. Additionally, there is a collection of selection experiments that begin investigating the factors shaping fecundity in these populations. This provides an impressive amount of data and analysis investigating the underlying factors involved. Together, this helps support the data showing that fluctuations and interactions are key components determining Arabidopsis fitness and are more broadly applicable across plant and non-plant species. 

      Strengths: 

      The field trials are well conducted with extensive effort and sampling. Similarly while the genetic analysis is complex it is well conducted and reflects the complexity of dealing with population structure that may be intricately linked to adaptive structure. This has no real solution and the option of presenting results with and without correction is likely the only appropriate option. 

      Weaknesses: 

      A significant finding from this study was that fecundity is shaped more by yearly fluctuations and their interaction with genotype than it is by the main effect of location or genotype. Another significant finding is that the strength of selection can be quite strong, with nearly 5x ranges across accessions. It should be noted that there are a number of other studies using Arabidopsis in the wild with multiple years and locations that found similar observations beyond the Oakley citation. In general, the context of how these findings relate to existing knowledge in Arabidopsis is a bit underdeveloped. 

      We shall remedy this in the revision (see also comments by Reviewer #1).

      The effects of the populations across the locations seem to rely on individual tests and PC analysis. It would seem to be possible to incorporate these tests more directly in the linear modeling analysis, and it isn't quite clear why this wasn't conducted. 

      The fecundity estimates were modelled for all experiments simultaneously and the results are presented in Figure 6 to explore the relative importance of genotype effects and interaction terms including genotypes. For survival and fecundity, the BLUPS are generated from linear mixed models fitted for all experiments simultaneously including a random intercept effect for the genotypes within experiments. A principal component analysis is used to explore the pattern of accession effects (BLUPS) on fecundity (Figure 7); this will be explained in the Methods.  

      I'm a bit puzzled by the discussion on how to find causative loci. This seems to focus solely on GWAS as the solution, with a goal to sequence vast individuals. But the loci that the manuscript discussed were found by a combination of structured mapping populations followed by molecular validation that then informed the GWAS. As such, I'm unsure if the proposed future approach of more sequencing is the best when a more balanced approach integrating diverse methods and population types will be more useful. 

      We are puzzled by this comment in return. Our statement about more sequencing (penultimate sentence of discussion) was referring to achieving a better understanding of the history of migration and selection rather than identifying causative loci. Happy for clarification!

      References

      Brachi, Benjamin, Daniele Filiault, Hannah Whitehurst, Paul Darme, Pierre Le Gars, Marine Le Mentec, Timothy C. Morton, et al. 2022. “Plant Genetic Effects on Microbial Hubs Impact Host Fitness in Repeated Field Trials.” Proceedings of the National Academy of Sciences of the United States of America 119 (30): e2201285119.

      Oakley, Christopher G., Douglas W. Schemske, John K. McKay, and Jon Ågren. 2023. “Ecological Genetics of Local Adaptation in Arabidopsis: An 8-Year Field Experiment.” Molecular Ecology, June. https://doi.org/10.1111/mec.17045.

    1. eLife Assessment

      This valuable study provides a 3D standardised anatomical atlas of the brain of an orb-weaving spider. The authors describe the brain's shape and its inner compartments - the neuropils - and add information on the distribution of a number of neuroactive substances such as transmitters and neuropeptides. Through the use of histological and microscopy methods, the authors provide a more complete view of an arachnid brain than previous studies and also present convincing evidence about the organisation and homology of brain regions. The work will serve as a reference for future studies on spider brains and will enable comparisons of brain regions with insects so that the evolution of these structures can be inferred across arthropods.

    2. Reviewer #1 (Public review):

      Summary:

      Artiushin et al. establish a comprehensive 3D atlas of the brain of the orb-web building spider Uloborus diversus. First, they use immunohistochemistry detection of synapsin to mark and reconstruct the neuropils of the brain of six specimens and they generate a standard brain by averaging these brains. Onto this standard 3D brain, they plot immunohistochemical stainings of major transmitters to detect cholinergic, serotonergic, octopaminergic/taryminergic and GABAergic neurons, respectively. Further, they add information on the expression of a number of neuropeptides (Proctolin, AllatostatinA, CCAP, and FMRFamide). Based on this data and 3D reconstructions, they extensively describe the morphology of the entire synganglion, the discernible neuropils, and their neurotransmitter/neuromodulator content.

      Strengths:

      While 3D reconstruction of spider brains and the detection of some neuroactive substances have been published before, this seems to be the most comprehensive analysis so far, both in terms of the number of substances tested and the ambition to analyze the entire synganglion. Interestingly, besides the previously described neuropils, they detect a novel brain structure, which they call the tonsillar neuropil.<br /> Immunohistochemistry, imaging, and 3D reconstruction are convincingly done, and the data are extensively visualized in figures, schemes, and very useful films, which allow the reader to work with the data. Due to its comprehensiveness, this dataset will be a valuable reference for researchers working on spider brains or on the evolution of arthropod brains.

      Weaknesses:

      As expected for such a descriptive groundwork, new insights or hypotheses are limited, apart from the first description of the tonsillar neuropil. A more comprehensive labeling in the panels of the mentioned structures would help to follow the descriptions. The reconstruction of the main tracts of the brain would be a very valuable complementary piece of data.

    3. Reviewer #2 (Public review):

      Summary

      Artiushin et al. created the first three-dimensional atlas of a synganglion in the hackled orb-weaver spider, which is becoming a popular model for web-building behavior. Immunohistochemical analysis with an impressive array of antisera reveals subcompartments of neuroanatomical structures described in other spider species as well as two previously undescribed arachnid structures, the protocerebral bridge, hagstone, and paired tonsillar neuropils. The authors describe the spider's neuroanatomy in detail and discuss similarities and differences from other spider species. The final section of the discussion examines the homology between onychophoran and chelicerate arcuate bodies and mandibulate central bodies.

      Strengths

      The authors set out to create a detailed 3D atlas and accomplished this goal.

      Exceptional tissue clearing and imaging of the nervous system reveal the three-dimensional relationships between neuropils and some connectivity that would not be apparent in sectioned brains.

      A detailed anatomical description makes it easy to reference structures described between the text and figures.

      The authors used a large palette of antisera which may be investigated in future studies for function in the spider nervous system and may be compared across species.

      Weaknesses

      It would be useful for non-specialists if the authors would introduce each neuropil with some orientation about its function or what kind of input/output it receives, if this is known for other species. Especially those structures that are not described in other arthropods, like the opisthosomal neuropil. Are there implications for neuroanatomical findings in this paper on the understanding of how web-building behaviors are mediated by the brain?

      Likewise, where possible, it would be helpful to have some discussion of the implications of certain neurotransmitters/neuropeptides being enriched in different areas. For example, GABA would signal areas of inhibitory connections, such as inhibitory input to mushroom bodies, as described in other arthropods. In the discussion section on relationships between spider and insect midline neuropils, are there similarities in expression patterns between those described here and in insects?

    4. Reviewer #3 (Public review):

      Summary:

      This is an impressive paper that offers a much-needed 3D standardized brain atlas for the hackled-orb weaving spider Uloborus diversus, an emerging organism of study in neuroethology. The authors used a detailed immunohistological whole-mount staining method that allowed them to localize a wide range of common neurotransmitters and neuropeptides and map them on a common brain atlas. Through this approach, they discovered groups of cells that may form parts of neuropils that had not previously been described, such as the 'tonsillar neuropil', which might be part of a larger insect-like central complex. Further, this work provides unique insights into the previously underappreciated complexity of higher-order neuropils in spiders, particularly the arcuate body, and hints at a potentially important role for the mushroom bodies in vibratory processing for web-building spiders.

      Strengths:

      To understand brain function, data from many experiments on brain structure must be compiled to serve as a reference and foundation for future work. As demonstrated by the overwhelming success in genetically tractable laboratory animals, 3D standardized brain atlases are invaluable tools - especially as increasing amounts of data are obtained at the gross morphological, synaptic, and genetic levels, and as functional data from electrophysiology and imaging are integrated. Among 'non-model' organisms, such approaches have included global silver staining and confocal microscopy, MRI, and, more recently, micro-computed tomography (X-ray) scans used to image multiple brains and average them into a composite reference. In this study, the authors used synapsin immunoreactivity to generate an averaged spider brain as a scaffold for mapping immunoreactivity to other neuromodulators. Using this framework, they describe many previously known spider brain structures and also identify some previously undescribed regions. They argue that the arcuate body - a midline neuropil thought to have diverged evolutionarily from the insect central complex - shows structural similarities that may support its role in path integration and navigation.

      Having diverged from insects such as the fruit fly Drosophila melanogaster over 400 million years ago, spiders are an important group for study - particularly due to their elegant web-building behavior, which is thought to have contributed to their remarkable evolutionary success. How such exquisitely complex behavior is supported by a relatively small brain remains unclear. A rich tradition of spider neuroanatomy emerged in the previous century through the work of comparative zoologists, who used reduced silver and Golgi stains to reveal remarkable detail about gross neuroanatomy. Yet, these techniques cannot uncover the brain's neurochemical landscape, highlighting the need for more modern approaches-such as those employed in the present study.

      A key insight from this study involves two prominent higher-order neuropils of the protocerebrum: the arcuate body and the mushroom bodies. The authors show that the arcuate body has a more complex structure and lamination than previously recognized, suggesting it is insect central complex-like and may support functions such as path integration and navigation, which are critical during web building. They also report strong synapsin immunoreactivity in the mushroom bodies and speculate that these structures contribute to vibratory processing during sensory feedback, particularly in the context of web building and prey localization. These findings align with prior work that noted the complex architecture of both neuropils in spiders and their resemblance (and in some cases greater complexity) compared to their insect counterparts. Additionally, the authors describe previously unrecognized neuropils, such as the 'tonsillar neuropil,' whose function remains unknown but may belong to a larger central complex. The diverse patterns of neuromodulator immunoreactivity further suggest that plasticity plays a substantial role in central circuits.

      Weaknesses:

      My major concern, however, is that some of the authors' neuroanatomical descriptions rely too heavily on inference rather than what is currently resolvable from their immunohistochemistry stains alone.

    1. eLife Assessment

      This manuscript presents an in-depth analysis of gene expression across multiple brown algal species with differing life histories, providing convincing evidence for the conservation of life cycle-specific gene expression. While largely descriptive, the study is an important step forward in understanding the core cellular processes that differ between life cycle phases, and its findings will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have examined gene expression between life cycle stages in a range of brown macroalgae to examine whether there are conserved aspects of biological features.

      Strengths:

      The manuscript incorporates large gene expression datasets from 10 different species and therefore enables a comprehensive assessment of the degree of conservation of different aspects of gene expression and underlying biology.

      The findings represent an important step forward in our understanding of the core aspects of cell biology that differ between life cycle phases and provide a substantial resource for further detailed studies in this area. Convincing evidence is provided for the conservation of life-cycle-specific gene expression between species, particularly in core housekeeping gene modules.

      Weaknesses:

      I found a few weaknesses in the methodology and experimental design. I think the manuscript could have been clearer when linking the findings to the biology of the brown algae.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Ratchinski et al presents a comprehensive analysis of developmental and life history gene expression patterns in brown algal species. The manuscript shows that the degree of generation bias or generation-specific gene expression correlates with the degree of dimorphism. It also reports conservation of life cycle features within generations and marked changes in gene expression patterns in Ectocarpus in the transition between gamete and early sporophyte. The manuscript also reports considerable conservation of gene expression modules between two representative species, particularly in genes associated with conserved functional characteristics.

      Strengths:

      The manuscript represents a considerable "tour de force" dataset and analytical effort. While the data presented is largely descriptive, it is likely to provide a very useful resource for studies of brown algal development and for comparative studies with other developmental and life cycle systems.

      Weaknesses:

      Notwithstanding the well-known issues associated with inferring function from transcriptomics-only studies, no major weaknesses were identified by this reviewer.

    1. eLife Assessment

      This study presents useful findings on how the transient absence of visual input (i.e., darkness) affects tactile neural encoding in the somatosensory cortex. The evidence supporting the authors' claims is incomplete, as key conclusions rely on subtle differences in surface roughness discriminability between sensory conditions, whose physiological underpinnings remain unclear. Potential methodological confounds are also not fully addressed. With additional analyses and methodological clarifications, this work could substantially inform neuroscientists studying cross-modal interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate how short-term visual deprivation influences tactile processing in the primary somatosensory cortex (S1) of sighted rats. They justify the study based on previous studies that have shown that long-term blindness can enhance tactile perception, and aim to investigate the neural mechanisms underlying rapid, short-term cross-modal plasticity. The authors recorded local field potentials from S1 as rats encountered different tactile textures (smooth and rough sandpaper) under light and dark conditions. They used deep learning techniques to decode the neural signals and assess how tactile representations changed across the four different conditions. Their goal was to uncover whether the absence of visual cues leads to a rapid reorganization of tactile encoding in the brain.

      Strengths:

      The study effectively integrates high-density local field potential (LFP) recordings with convolutional neural network (CNN) analysis. This combination allows for decoding high-dimensional population-level signals, revealing changes in neural representations that traditional analyses (e.g., amplitude measures) failed to detect. The custom treadmill paradigm permits independent manipulation of visual and tactile inputs under stable locomotion conditions. Gait analysis confirms that motor behavior was consistent across conditions, strengthening the conclusion that neural changes are due to sensory input rather than movement artifacts.

      Weaknesses:

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization).

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might contribute to the observed neural differences. These factors are acknowledged but not directly measured (e.g., via pupillometry or cortical state indicators).

      (3) Moreover, the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized - only that population-level signals become more discriminable. As such, the term "plasticity" may overstate the conclusions and should be interpreted with caution unless validated by additional causal or longitudinal data.

      (4) The study highlights the forelimb region of S1 and a post-contact temporal window as particularly important for decoding texture, based on occlusion and integrated gradient analyses. However, this finding may be somewhat circular: The LFPs were aligned to forelimb contact, and the floor textures were sensed primarily via the forelimbs, making it unsurprising that forelimb electrodes were most informative. The observed temporal window corresponds directly to the event-aligned epoch, and while it may shift slightly in duration in the dark, this could reflect general differences in sensory gain or arousal, rather than changes in stimulus-specific encoding. Thus, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      (5) While the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Without a behavioral readout (e.g., discrimination accuracy), claims about perceptual enhancement remain speculative.

      (6) In addition to point 4, the authors discuss implications for sensory rehabilitation, including Braille training and haptic feedback enhancement. However, the lack of actual chronic or even more acute pathological sensory deprivation, behavioral data, or subsequent intervention in this study limits the ability to draw translational conclusions. It remains unknown whether the more distinct neural representations observed actually translate into better tactile performance, discriminability, or perception. Additionally, extrapolating from rats walking on sandpaper in the dark to human rehabilitative contexts is speculative without a clearer behavioral or mechanistic bridge. The potential is certainly there, but the claim is currently aspirational rather than empirically grounded.

      (7) While the CNN showed good performance, details on generalization robustness and validation (e.g., cross-validation folds, variance across animals) are not deeply discussed. Also, while explainability tools were used, interpretability of CNNs remains limited, and more transparent models (e.g., linear classifiers or dimensionality reduction) could offer complementary insights.

      Therefore, while the authors raise interesting hypotheses around rapid plasticity, somatotopic dynamics, and rehabilitation, the evidence for each is indirect. Stronger claims would require causal experiments, behavioral readouts, and mechanistic specificity beyond what the current data can provide.

    3. Reviewer #2 (Public review):

      Summary:

      Yamashiro et al. investigated how the transient absence of visual input (i.e., darkness) impacts tactile neural encoding in the rat primary somatosensory cortex (S1). They recorded local field potentials (LFPs) using a 32-channel array implanted in forelimb and hindlimb primary somatosensory cortex while rats walked on smooth or rough textures under illuminated and dark conditions. Employing a convolutional neural network (CNN), they successfully decoded both texture and lighting conditions from the LFPs. The authors conclude that the subtle differences in LFP patterns underlie tactile representation of surface roughness and become more distinct in darkness, suggesting a rapid cross-modal reorganization of the neural code for this sensory feature.

      Strengths:

      (1) The manuscript addresses a valuable question regarding how sensory cortices adapt dynamically to changes in sensory context.

      (2) Utilization of machine learning (CNNs) allowed the authors to go beyond conventional amplitude-based analyses, potentially uncovering a subtle but interesting phenomenon.

      Weaknesses:

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

    1. eLife Assessment

      This important study addresses how wing morphology and kinematics change across hoverflies of different body sizes. The authors provide convincing evidence that there is no significant correlation between body size and wing kinematics across 28 species and instead argue that non-trivial changes in wing size and shape evolved to support flight across the size range. Overall, this paper illustrates the power and beauty of an integrative approach to animal biomechanics and will be of broad interest to biologists, physicists and engineers.

    2. Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across twenty eight and eight hoverfly species, respectively; the aim is to identify how weight support during hovering is ensured across body sizes. Wing shape and relative wing size vary non-trivially with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology, and that these changes enabled hoverflies to decrease in size. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be subject to stronger evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analyses, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly places the results in broad biomechanical, ecological, and evolutionary context. In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship.

      Weaknesses

      The work presents a mechanical analysis that is focused solely on aerodynamics; but these aerodynamic demands impose no less relevant demands on the primary engine that drives wing movement: muscle. The relation between the assumed null hypotheses, the observed empirical allometric relations, and the power and work demand they place on muscle remains unclear. Though this is clearly a minor weakness, future work will have to address the link between aerodynamics, wing shape, wing dynamics, and musculoskeletal system in more detail, as discussed briefly by the authors.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      The paper is well written and the figures well laid out. The methods are easy to follow, and the rational and logic for each experiment easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      The authors have done a lot of work addressing my previous concerns and those of the other Reviewers.

      We are pleased that the revised manuscript satisfactorily addresses the previous concerns of the reviewer.

      Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across twenty eight and eight hoverfly species, respectively; the aim is to identify how weight support during hovering is ensured across body sizes. Wing shape and relative wing size vary non-trivially with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology, and that these changes enabled hoverflies to decrease in size. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be subject to stronger evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analyses, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly places the results in broad biomechanical, ecological, and evolutionary context.

      We thank the reviewer for appreciating the strengths of our study.

      Weaknesses

      (1) In assessing evolutionary allometry, it is key to pinpoint the variation expected from changes in size alone. The null hypothesis for wing morphology is well-defined (isometry), but the equivalent predictions for kinematic parameters, although specified, are insufficiently justified, and directly contradict classic scaling theory. A detailed justification of the "kinematic similarity" assumption, or a change in the null hypothesis, would substantially strengthen the paper, and clarify its evolutionary implications.

      We agree with the reviewer that a clearly articulated null hypothesis is crucial for interpreting scaling relationships. In fact, when carefully reviewing our manuscript, we realized that we nowhere did so, and which might have led to a misinterpretation of this. In the revised manuscript, we therefore now explicitly state our newly defined null hypotheses (lines 120–125, 340-352), and how we tested these (lines 359-360).

      In fact, we define two alternative null hypotheses: (1) weight support is maintained across sizes using allometric scaling of wing morphology only, and thus wingbeat kinematics are kept constant (kinematic similarity); (2) weight support is maintained across sizes using allometric scaling of wingbeat kinematics, while wing morphology scales isometrically (morphological similarity).

      According to the first null hypothesis, the second-moment-of-area of the wing should scale linearly with body mass, resulting in negative allometry of S<sub>2</sub> relative to body mass (S<sub>2</sub>∼m<sup>1</sup> <m<sup>4/3</sup>). According to the second null hypothesis, the product of wingbeat frequency and amplitude should scale with mass under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>). We test these alternative null hypotheses using Phylogenetic Generalized Least Square (PGLS) regressions of the morphology and kinematics metrics against the body mass.

      Furthermore, in our revised manuscript, we now also better explain the use of "kinematic similarity" assumption as a theoretical scenario, that is physically, biomechanically nor physiological sustainable across sizes, but that we merely use to define our null hypotheses (lines 340-351). This is made particularly explicit in a new subsection named “Theoretical considerations” (lines 448–461). Note that our second null hypothesis is thus not that hoverflies fly under "kinematic similarity", but that wingbeat kinematics scales under negative allometry (ω∼ƒ A<sub>ϕ</sub>∼m<sup>-1/6</sup>), which we assume is in line with the classic scaling theory that the reviewer refers to.

      We sincerely thank the reviewer for making us aware that we did not explicitly state our null hypotheses, and that introducing these new null hypotheses removed the confusion about the assumptions in our study.

      (2) By relating the aerodynamic output force to wing morphology and kinematics, it is concluded that smaller hoverflies will find it more challenging to support their body mass--a scaling argument that provides the framework for this work. This hypothesis appears to stand in direct contrast to classic scaling theory, where the gravitational force is thought to present a bigger challenge for larger animals, due to their disadvantageous surface-to-volume ratios. The same problem ought to occur in hoverflies, for wing kinematics must ultimately be the result of the energy injected by the flight engine: muscle. Much like in terrestrial animals, equivalent weight support in flying animals thus requires a positive allometry of muscle force output. In other words, if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too (but not vice versa). Clarifying the relation between the scaling of muscle mechanical input, wing kinematics, and weight support would help resolve the conflict between these two contrasting hypotheses, and considerably strengthen the biomechanical motivation and evolutionary interpretation.

      We agree with the reviewer that, due to disadvantageous surface-to-volume ratios, larger animals are more challenged to maintain weight-support, and that this is also the case for hovering hoverflies. In the current manuscript, we do not aim to challenge this universal scaling law of muscle force with body mass.

      Instead, we here focus merely on how the flight propulsion system (wing morphology and kinematics) scale with size, and how this allows hovering hoverflies to maintain weight support. We also fully agree with the reviewer that in theory, “if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too”. This aligns in fact with our second null hypothesis where wingbeat frequency should scale as ƒ∼m<sup>-1/6</sup>, to maintain weight support under morphological isometry.

      In our study, we show that this null hypothesis is rejected (lines 511-517, and line 525), and thus hoverflies primarily adjust their wing morphology to maintain in-hovering weight-support across sizes, and wingbeat kinematics is in fact highly conserved. Why this specific flight kinematics is so strongly conserved is not known, and thus a key topic in the discussion section of our manuscript.

      We agree with the reviewer that muscle physiology might be an important driver for this conserved kinematics, but also aerodynamic efficiency and maneuverability could be key aspects here. In our revised manuscript, we now discuss these three aspects in more detail (lines 762-775). Also, we here now also mention that we aim to address this outstanding question in future studies, by including muscle physiology in our animal flight studies, and by studying the aerodynamics and maneuver kinematic of hoverflies in more detail. 

      Moreover, in our revised introduction section, we now also mention explicitly that the capability for maintaining in-flight weight-support scales inversely with animal size, due to the negative isometric scaling of muscle force with body mass (line 52-56). Furthermore, we removed all statements that might suggest the opposite. We hope that these adjustments helped resolve the apparent conflict between our null hypotheses and general muscle scaling laws.

      Finally, in the Discussion section (lines 770-775), we now more explicitly acknowledge that wing motion is ultimately driven by the flight motor musculature, and that a full biomechanical interpretation must consider the scaling of muscle mechanical input alongside wing kinematics and morphology. While we decided to keep the focus primarily on aerodynamic constraints in this study, we agree that future work integrating both aerodynamic and physiological scaling will be essential to fully resolve these contrasting perspectives.

      (3) One main conclusion-- that miniaturization is enabled by changes in wing morphology--is insufficiently supported by the evidence. Is it miniaturization or "gigantism" that is enabled by (or drives) the non-trivial changes in wing morphology? To clarify this question, the isolated treatment of constraints on the musculoskeletal system vs the "flapping-wing based propulsion" system needs to be replaced by an integrated analysis: the propulsion of the wings, is, after all, due to muscle action. Revisiting the scaling predictions by assessing what the engine (muscle) can impart onto the system (wings) will clarify whether non-trivial adaptations in wing shape or kinematics are necessary for smaller or larger hovering insects (if at all!).

      In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship.

      In response to the first review round, we have removed all references to “miniaturization,” as our data does not allow us to infer evolutionary trajectories of body size (i.e., whether lineages have become smaller or larger over time). We now frame our conclusion more conservatively: that changes in wing morphology enable small hoverflies to maintain weight support despite the aerodynamic disadvantages imposed by isometric scaling.

      We fully agree that an integrated biomechanical framework, explicitly linking muscle mechanical output with wing kinematics and morphology, would significantly strengthen the study. However, we believe that performing an integrated analysis assessing the scaling of muscle input into the wing is beyond the current scope, which focuses specifically on the aerodynamic consequences of morphological and kinematic variation (see reply above).

      Reviewer #3 (Public review):

      This paper addresses an important question about how changes in wing morphology vs. wing kinematics change with body size across an important group of high-performance insects, the hoverflies. The biomechanics and morphology convincingly support the conclusions that there is no significant correlation between wing kinematics and size across the eight specific species analyzed in depth and that instead wing morphology changes allometrically. The morphological analysis is enhanced with phylogenetically appropriate tests across a larger data set incorporating museum specimens.

      The authors have made very extensive revisions that have significantly improved the manuscript and brought the strength of conclusions in line with the excellent data. Most significantly, they have expanded their morphological analysis to include museum specimens and removed the conclusions about evolutionary drivers of miniaturization. As a result, the conclusion about morphological changes scaling with body size rather than kinematic properties is strongly supported and very nicely presented with a strong complementary set of data. I only have minor textual edits for them to consider.

      We thank the reviewer for this positive feedback. We are pleased to hear that the revised manuscript is satisfactory.

      Reviewer #2 (Recommendations For The Authors):

      My main remaining qualm remains the null hypothesis for the scaling of kinematic parameters - all weaknesses come back to this point. I appreciate that the authors now specify an expectation, but they offer no justification. This is a problem, because the expectation dictates the interpretation of the results and is thus crucial to some of the key claims (including one in the paper title!): the choice made by the authors indeed implies that hovering is harder for small hoverflies, so that the reported changes in size-specific wing morphology are to be interpreted as an adaptation that enables miniaturization. However, why is this choice appropriate over alternatives that would predict the exact opposite, namely that hovering is harder for larger hoverflies?

      In my original review, I suggested that the authors may address this key question by considering the scaling of muscle mechanical output, and provided a quick sketch of what such an argument would look like, both in classic textbook scaling theory, and in the framework of more recent alternative approaches. The authors have decided against an implementation of this suggestion, providing various version of the following justification in their reply: "our study focuses precisely on this constraint on the wing-based propulsion system, and not on the muscular motor system." I am puzzled by this distinction, which also appears in the paper: muscle is the engine responsible for wing propulsion. How can one be assessed independent of the other? The fact that the two must be linked goes straight to the heart of the difficulty in determining the null hypotheses for the allometry of kinematic and dynamic parameters: they must come from assertions on how muscle mechanical output is expected to vary with size, and so couple muscle mechanical output to the geometry of the wing-based propulsion system. What if not muscle output dictates wing kinematics?

      I fully agree with the authors that null hypotheses on kinematic parameters are debatable. But then the authors should debate their choice, and at least assess the plausibility of its implications (note that the idea of "similarity" in scaling does not translate to equal or invariant, but is tied closely to dimensional analysis - so one cannot just proclaim that kinematic similarity implies no change in kinematic parameters). I briefly return to the same line of argument I laid out in the initial review to provide such an assessment:

      Conservation of energy implies:

      W = 1/2 I ω2

      where I is the mass moment of inertia and W is the muscle work output. Under isometry, I ∝m5/3, the authors posit ω ∝m0, and it follows at once that they predict W ∝m5/3. That is, the "kinematic similarity" hypothesis presented in the paper implies that larger animals can do substantially more work per unit body mass than small animals (unless the author have an argument why wing angular velocity is independent of muscle work capacity, and I cannot think of one). This increase in work output is in contradiction with the textbook prediction, going all the way back to Borelli and Hill: isogeometric and isophysiological animals ought to have a constant mass-specific work output. So why, according to the authors, is this an incorrect expectation, ie how do they justify the assumption ω ∝m0 and its implication W ∝m5/3? How can larger animals do more mass-specific work, or, equivalently, what stops smaller animals from delivering the same mass-specific work? If non-trivial adaptations such as larger relative muscle mass enable larger animals to do more work, how does this fit within the interpretation suggested by the authors that the aerodynamics of hovering require changes in small animals?

      A justification of the kinematic similarity hypothesis, alongside answers to the above questions, is necessary, not only to establish a relation to classic scaling theory, but also because a key claim of the paper hinges on the assumed scaling relationship: that changes in wing morphology enable hovering in small hoverflies. If I were to believe Borelli, Hill and virtually all biomechanics textbooks, the opposite should be the case: combing constant mass-specific work output with eq. 1, one retrieves F∝m2/3, so that weight support presents a bigger challenge for larger animals; the allometry of wing morphology should then be seen as an adaptation that enables hovering in larger hoverflies - the exact opposite of the interpretation offered by the authors.

      Now, as it so happens, I disagree with classic scaling theory on this point, and instead believe that there are good reasons to assume that muscle work output varies non-trivially with size. The authors can find a summary of the argument for this disagreement in the initial review, or in any of the following references:

      Labonte, D. A theory of physiological similarity for muscle-driven motion. PNAS, 2023, 120, e2221217120

      Labonte, D.; Bishop, P.; Dick, T. & Clemente, C. J. Dynamics similarity and the peculiar allometry of maximum running speed. Nat Comms., 2024, 15, 2181

      Labonte, D. & Holt, N. Beyond power limits: the kinetic energy capacity of skeletal muscle. J Exp Bio, 2024, 227, jeb247150

      Polet, D. & Labonte, D. Optimal gearing of musculoskeletal systems. Integr Org Biol, 2024, 64, 987-10062024

      I am asking neither that the authors agree with the above references nor that they cite them. But I do expect that they critically discuss and justify their definition of kinematic similarity, its relation to expectation from classic scaling theory, and the implications for their claim that hovering is harder for small animals. I do note that the notion of "physiological similarity" introduced in the above references predicts a size-invariant angular velocity for small animals, that small animals should be able to do less mass-specific work, and that average muscle force output can grow with positive allometry even for isogeometric systems. These predictions appear to be consistent with the data presented by the authors.

      We agree with the reviewer that our null hypothesis was not clearly articulated in our previous version of the manuscript, and that this might have led to a misinterpretation of the merits and limitations of our study. In the revised manuscript, we therefore now explicitly introduce our null hypotheses in the Introduction (lines 120–125), we define these in the Methods section (lines 340–360), test these in the Results section (lines 511–517), and reflect on the results in the Discussion (lines 602–610). We thank the reviewer for pointing out this unclarity in our manuscript, because revising it clarified the study significantly. See our replies in the “Public Review” section for details.

      Minor points

      L56: This is somewhat incomplete and simplistic; to just give one alternative option, weight support with equivalent muscle effort could also be ensured by a change in gearing (see eg Biewener's work). It is doubtful whether weight support is a strong selective force, as any animal that can move will be able to support its weight. The impact of scaling on dynamics is thus arguably more relevant.

      We thank the reviewer for pointing out that our original sentence may be too simplistic. We now briefly mention alternative mechanisms (suggested by the reviewer) to provide more nuance (line 56-58).

      L58: I am not aware of any evidence that smaller animals have reduced the musculature dedicated to locomotion beyond what is expected from isometry; please provide a reference for this claim or remove it.

      We removed that claim.

      The authors use both isometry and geometric similarity. As they also talk about muscle, solely geometric similarity (or isogeometry) may be preferable, to avoid confusion with isometric muscle contractions.

      To avoid confusion, we now use “geometric similarity” wherever the use of isometry might be ambiguous.

      L86: negative allometry only makes sense if there is a justified expectation for isometry - I suggest to change to "The assumed increase in wingbeat frequency in smaller animals" or similar, or to clarify the kinematic similarity hypothesis.

      We edited the sentence as suggested.

      L320: This assertion is somewhat misleading. Musculoskeletal systems are unlikely to be selected for static weight support. Instead, they need to allow movement. Where movement is possible, weight support is trivially possible, and so weight support should rarely, if ever, be a relevant constraint. At most, the negative consequence of isometry on weight support would be that a larger fraction of the muscle mass needs to be active in larger animals to support the weight.

      We fully agree with the reviewer that musculoskeletal systems are unlikely not selected for static loads, as the ability to move dynamically in the real world is crucial for survival. That said, we here look at hovering flight, which is far from static. In fact, hovering flight is among the energetic most costly movement patterns found in nature, due to the required high-frequency wingbeat motions (Dudley 2002). Rapid maneuvers are of course more power demanding, but hovering is a good proxy for this. For example, in fruit flies maximum force production in rapid evasive maneuvers are only two times the force produced during hovering (Muijres et al., 2014).

      We agree with the reviewer that it is important to explicitly mention the differences in functional demands on the motor system in hovering and maneuvering flight, and thus we now do so in both the introduction and discussion sections (lines 116-118 and 762-765, respectively).

      Dudley, Robert. The biomechanics of insect flight: form, function, evolution. Princeton university press, 2002.Muijres, F. T., et al. "Flies evade looming targets by executing rapid visually directed banked turns." Science 344.6180 (2014): 172-177.

      Reviewer #3 (Recommendations For The Authors):

      Throughout, check use of "constrains" vs. "constraints"

      Thank you for pointing this out. We have corrected these errors.

      Line 52 do you mean lift instead of thrust?

      We agree with the reviewer that the use of “thrust” might be confusing in the context of hovering flight, and thus we replaced “flapping-wing-based aerodynamic thrust-producing system” with the “flapping-wing-based propulsion system”. This way, we no longer use the word thrust in this context, and only use lift as the upward-directed force required for weight-support.

      Line 60 "face also constrains" wording

      Corrected.

      Line 79 Viscous forces only "dominate" at Re<1 and so this statement only refers to very very small insects which I suspect are far below the scale of the hoverflies considered (likely Re ~100) although maybe not for the smallest 3 mg ones?

      Indeed, viscous forces do not “dominate” force production at the Reynolds numbers of our flying insects. We thank the reviewer for pointing out this incorrect statement, which we corrected in the revised manuscript.

      Line 85 again thrust doesn't seem to be right

      Agreed. See reply 3.2.

      533 "maximized" should probably be "increased"

      We now use “increased”.

      Line 705-710 The new study by Darveau might help resolve this a bit because of the reliability of this relationship across and between orders. Darveau, C.-A. (2024). Insect Flight Energetics And the Evolution of Size, Form, And Function. Integrative And Comparative Biology icae028.

      We thank the reviewer for this highly relevant reference, which was unfortunately not included in the original manuscript. In connection with this work, we now further discuss the relationship between wing size allometry and deviations from the expected scaling of wingbeat frequency (lines 730-735).

    1. eLife Assessment

      This valuable observational study was conducted in Dar es Salaam, Tanzania, to investigate potential associations between genetic variation in Mycobacterium tuberculosis and human host vs. disease severity. The authors conclude that human genetic ancestry did not contribute to tuberculosis severity and the evidence supporting this is generally convincing. The findings have significance for the understanding of the influence of host/bacillary genetics on tuberculosis disease.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the results of an observational study conducted in Dar es Salaam, Tanzania, investigating potential associations between genetic variation in M. tuberculosis and human host vs. disease severity. The headline finding is that no such associations were found, either for host / bacillary genetics as main effects or for interactions between them.

      Strengths:

      Strengths of the study include its large size and rigorous approaches to classification of genetic diversity for host and bacillus.

      Comments on revisions:

      The authors have responded satisfactorily to comments raised.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This Tanzanian study focused on the relationship between human genetic ancestry, Mycobacterium tuberculosis complex (MTBC) diversity, and tuberculosis (TB) disease severity. The authors analyzed the genetic ancestry of 1,444 TB patients and genotyped the corresponding MTBC strains isolated from the same individuals. They found that the study participants predominantly possess Bantu-speaking genetic ancestry, with minimal European and Asian ancestry. The MTBC strains identified were diverse and largely resulted from introductions from South or Central Asia. Unfortunately, no associations were identified between human genetic ancestry, the MTBC strains, or TB severity. The authors suggest that social and environmental factors are more likely to contribute to TB severity in this setting.

      Strengths:

      In comparison to other studies investigating the role of human genetics in TB phenotypes, this study is relatively large, with more than 1,400 participants.

      The matched human-MTBC strain collection is valuable and offers the opportunity to address questions about human-bacterium co-evolution.

      Weaknesses:

      Although the authors had genome-wide genotyping and whole genome sequencing data, they only compared the associations between human ancestry and MTBC strains. Given the large sample size, they had the opportunity to conduct a genome-wide association study similar to that of Muller et al. (https://doi.org/10.1016/j.ygeno.2021.04.024).

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments. In another published study using the same cohort (https://doi.org/10.1101/2023.05.11.23289848), we performed a genome-wide association analysis between the genome-wide SNPS of the host and the genome-wide SNPs from the paired MTBC strains. In the current work we were interested in testing specifically if host ancestry and pathogen genotype family, as well as their interaction, were associated with differences in disease severity, a clinical phenotype with direct consequences for both host and pathogen fitness. The study of Müller et al, referred to by the reviewer, investigates whether MTBC families of strains causing disease in two patient cohorts (South Africa and Ghana) were associated with particular human SNPS assessed genome-wide. In that study, clinical phenotypes were not assessed and human ancestries, in a much broader sense than the ones used in our current study, were used as covariates. To leverage the genome-wide information and the clinical variables collected in our study, we have now added a genome-wide association analysis of all the human SNPs with disease severity measures while adjusting for co-variates (age, sex,  smoking, cough duration, socioeconomic status, history of previous TB, malnutrition, education level, and drug resistance status) and for human population stratification . Yet, no significant statistical associations were detected (L243-249).

      The authors tested whether human genetic ancestry is associated with TB severity. However, the basis for this hypothesis is unclear. The studies cited as examples all focused on progression to active TB (from a latent infection state), which should not be conflated with disease severity. It is difficult to ascertain whether the role of genetic ancestry in disease severity would be detectable through this study design, as some participants might simply have been sicker for longer before being diagnosed (despite the inquiry about cough duration). This delay in diagnosis would not be influenced solely by human genetics, which is the conclusion of the study.

      Evidence that mortality and natural recovery from TB vary by disease presentation spectrum come from studies carried out before the introduction of anti-TB chemotherapy. Patients with mild disease presentation, as measured by radiology at the time of diagnosis had higher odds of recovering naturally compared to those with advanced disease (doi: 10.5588/ijtld.23.0254, doi: 10.1164/arrd.1960.81.6.839). Given the deleterious effects of an MTBC infection leading to symptomatic disease on human fitness, we hypothesized that natural selection has acted on human traits underlying TB disease severity. If those traits are heritable one would expect to find underlying genetic variation in human populations. In addition, because certain MTBC genotype families and human populations have co-existed since a least a few centuries to a few millennia, we hypothesized that some of that genetic variation could be related to human ancestry. We have added more details to the introduction to make our rational clearer (L118-127).  In our patient cohort, we observed a large variation in disease severity using as approximations; TB-Score, X-Ray score and bacterial burden in sputa (Ct-value as determined with GeneXpert). However, the reviewer is absolutely correct in that patients in our study are being diagnosed at different stages of disease confounding our analysis. This is a limitation of our study which cannot be fully accounted for by including cough duration, as we also acknowledged in the manuscript (L343-346).

      Additionally, the study only included participants who attended the TB clinic.

      Yes, this is related to the previous point, our study only considers patients that felt ill enough to visit the TB clinic potentially not including patients that had less severe disease as acknowledged.

      Including healthy controls from the general population would have provided an interesting comparison to see if ancestry proportions differ.

      We agree that it would be interesting to compare the ancestries of healthy controls to the ancestries of TB patients from the same population. However, that would be especially informative with respect to TB susceptibility and would not necessarily be informing disease severity traits and its underlying genetics. The similarities between the ancestry proportions of our cohort with those of neighboring countries such as Kenya, Malawi and Mozambique publicly available genomic data, suggests that there would be no major differences between TB patients and healthy controls.

      Although the authors suggest that social and environmental factors contribute to TB severity, only age, smoking, and HIV status were characterised in the study.

      Based on the comments of both reviewers, we added the following additional variables as covariates in the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition, the education level and whether it was a relapse/reinfection or a new case.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the results of an observational study conducted in Dar es Salaam, Tanzania, investigating potential associations between genetic variation in M. tuberculosis and human host vs. disease severity. The headline finding is that no such associations were found, either for host / bacillary genetics as main effects or for interactions between them.

      Strengths:

      Strengths of the study include its large size and rigorous approaches to classification of genetic diversity for host and bacillus.

      Weaknesses:

      (1) There are some limitations of the disease severity read-outs employed: X-ray scores and Xpert cycle thresholds from sputum analysis can only take account of pulmonary disease. CXR is an insensitive approach to assessing 'lung damage', especially when converted to a binary measure. What was the basis for selection of Ralph score of 71 to dichotomise patients? If outcome measures were analysed as continuous variables, would this have been more sensitive in capturing associations of interest?

      Thank you very much for taking the time to carefully review our manuscript and for your suggestions and comments.  

      We recruited active TB patients with pulmonary TB disease that were sputum smear-positive and GeneXpert-positive. In this study we aimed at obtaining paired samples from both the patient and the strain, and in the current analysis we aimed at testing if human ancestry and its interaction with the strain genotype could explain differences in disease severity. It is often difficult to obtain microbiological cultures from extra-pulmonary cases and including those cases would have not been possible at the scale of this cohort. We believe as well that extra-pulmonary TB is of less relevance for the question we are addressing because in exclusively extrapulmonary cases, disease severity is not linked with bacterial transmission. However, extra-pulmonary TB can be extremely severe, and it would be very interesting to explore the potential role of human genetic variation underlying extra-pulmonary TB in future studies.

      As to the insensitivity of CXR to measure lung damage, we would argue that it depends on what is being assed. As a rationale for the Ralph score, its inventors argue that as in other grading methods, the proportion of affected lung and or cavitation is important to assess severity. It has been described as a “validated method for grading CXR severity in adults with smear-positive pulmonary TB that correlates with baseline clinical and microbiological severity and response to treatment, and is suitable for use in clinical trials” (https://thorax.bmj.com/content/thoraxjnl/65/10/863.full.pdf). While the validation of the score is convincing in that study, and the score has been used in several TB studies and trials, the low proportion of HIV co-infections might have been a limitation. Indeed, as shown in our previous publication, in our cohort of patients, chest X-ray scores were significantly lower in HIV infected TB patients https://doi.org/10.1371/journal.ppat.1010893. In the current analysis, regression analyses performed for the CXR severity and for the other severity measures did not include HIV co-infected patients.

      We obtained the same pattern of results using a continuous outcome. However, an assumption of linear regression was violated. The residuals were not normally distributed stemming from the bimodal distribution of the scores in our dataset. The threshold of 71 for the Ralph score has been used by others in previous studies; in its original description it has been suggested as the optimal cut-off point for predicting a positive sputum smear status after two months, which in turn has been shown to predict unfavorable outcomes (https://doi.org/10.1136/thx.2010.136242). Another study showed that a Ralph score higher than 71 was significantly associated with a longer duration of symptoms, higher clinical scores and a lower BMI (doi: 10.5603/ARM.2018.0032).

      (2) There is quite a lot of missing data, especially for TB scores - could this have introduced bias? This issue should be mentioned in the discussion.

      While we have a TB-score available for each patient, the chest X-ray score is missing for many patients. However, this is random and due both to the absence of an X-ray picture or to the bad quality of X-ray pictures that the radiologists could not assess. When stating that there is a lot of missing data for the TB scores, we assume that the reviewer was referring to the “missing N” columns in Table 1. There, the number of observations missing in each of the disease severity measures actually relates to the explanatory variables (i.e MTBC genotype and human ancestries). This table includes all patients that either had a bacterial genome available or a human genome/genotype (N = 1904). As an example for the TB-score as outcome variable, for 1471 patients the MTBC genotype was determined while it was missing for 433 patients. On the other hand for X-ray scores, 177 had a severe X-ray score, 849 a mild one and for 878 patients, there was no X-ray score available.  As for the Ct-value, despite the fact that the patients were recruited based on positive GeneXpert by the clinical team, these results were not always available to us.

      (3) The analysis adjusted for age, sex, HIV status, age, smoking and cough duration - but not for socio-economic status. This will likely be a major determinant of disease severity. Was adjustment made for previous TB (i.e. new vs repeat episode) and drug-sensitivity of the isolate? Cough duration will effectively be a correlate/consequence of more severe disease - thus likely highly collinear with disease severity read-outs - not a true confounder. How does removal of this variable from the model affect results? Data on socioeconomic status should be added to models, or if not possible then lack of such data should be noted as a limitation.

      Out of the 1904 patients that have either human or bacterial genomic data available, 48 were relapses (2.5%). The mean of the disease severity measures suggest that relapses have a higher CXR score but the TB-score and Ct-values did not differ. Based on the comments of both reviewers, we added the following additional variables as covariates to the regression models: the socioeconomic status representing the ratio between the household income and the number of individuals in the household, malnutrition examined by a doctor, the education level, and whether it was a relapse/reinfection or a new case and if the causative strain had any resistance to any anti-TB drugs. The results did not change. Cough duration could also be a consequence of more severe disease, as pointed out by the reviewer. We present now the results excluding cough duration as a variable from the model, however this also did not affect the results.

      (4) Recruitment at hospitals may have led to selection bias due to exclusion of less severe, community cases. The authors already acknowledge this limitation in the Discussion however.

      (5) Introduction: References refer to disease susceptibility, but the authors should also consider the influences of host/pathogen genetics on host response - both in vitro (PMIDs 11237411, 15322056) and in vivo (PMID 23853590). The last of these studies encompassed a broader range of ethnic variation than the current study, and showed associations between host ancestry and immune response - null results from the current study may reflect the relative genetic homogeneity of the population studied.

      We thank the reviewer for these suggestions which we have added to the introduction. 

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) The authors should be careful when using the term "Bantu" as opposed to "Bantu-speaking". (i.e. referring to the language group). The term is considered offensive in some settings.

      We thanks the reviewer for this important concern, we have revised throughout the manuscript.

      (2) There are several "(Error! Reference source not found)" phrases in the place of references throughout the document.

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (3) Please correct line 365: "... sequencing (WGS) the patient...." to "... sequencing (WGS) of the patient...."

      (4) The figures in the supplementary PDF are not numbered and some are cut-off (I think it is Supplementary Figure S2).

      This has been corrected in the revised version.

      Reviewer #2 (Recommendations for the authors):

      Typographical errors

      (1) There are multiple instances where references have not pulled through to the text, e.g. line 126 (Error! Reference source not found.)

      We thank the reviewer for pointing this out, this has been corrected in the revised version.

      (2) Line 239: have been show - have been shown?

      Thank you, this mistake has been corrected in the revised version.

    1. eLife Assessment

      This important study shows that the activity of hypothalamic hypocretin/orexin neurons (HONs) correlates with body movement over multiple behaviors. Compelling evidence, supported by sophisticated, cutting-edge tools and data analyses, highlights a link that appears to be unique to HONs. This work should be of interest to scientists studying peptidergic neurons, movement, energy regulation, and brain-body coordination.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Tesmer and colleagues uses fiber photometry recordings, sophisticated analysis of movement, and deep learning algorithms to provide compelling evidence that activity in hypothalamic hypocretin/orexin neurons (HONs) correlates with net body movement over multiple behaviors. By examining projection targets, the authors show that hypocretin/orexin release differs in projection targets to the locus coeruleus and substantia nigra, pars compacta. Ablation of HONs does not cause differences in the power spectra of movements. Movement tracking ability of HONs is independent of HON activity that correlates with blood glucose levels. Finally, the authors show that body movement is not encoded to the same extent in other neural populations.

      Strengths:

      The major strengths of the study are the combination of fiber photometry recordings, analysis of movement in head-fixed mice, and sophisticated classification of movement using deep learning algorithms. The experiments seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript.

      Weaknesses:

      To some degree, it is already known that hypocretin/orexin neurons correlate with movement and arousal, although this manuscript studies this correlation with unprecedented sophistication and scale.

      Taken together, this study is likely to be impactful to the field and our understanding of HONs across behavioral states.

    3. Reviewer #4 (Public review):

      Summary:

      Using head-fixed approach, the authors show a rapid impact of movement on the activity level of hypothalamic orexin/hypocretin neurons.

      Strengths:

      The head-fixed approach is great to isolate specific movements and their impact on neuronal activity.

      Weaknesses:

      Many of the weaknesses that were noted in the previous round of review have been addressed.

    4. Reviewer #5 (Public review):

      Summary:

      Hypothalamic hypocretin/orexin neurons are well-known to be involved in arousal, muscle tone and energy metabolism. Using a combination of fiber photometry, video-based movement assessments, and deep learning algorithms, the authors provide compelling evidence that the activity of these neurons correlates with net body movement over multiple behaviors and is independent of nutritional state. The authors also demonstrate that hypocretin/orexin release differs between two downstream projection sites, the locus coeruleus and substantia nigra, and are able to distinguish the activity in these sites that is due to inputs from these hypothalamic neurons vs. from other subcortical populations. The authors also convincingly show that the correlation between body movement and hypocretin/orexin neuron activity is much stronger compared to other subcortical regions. However, hypocretin/orexin neuron ablation does not affect the power spectra of movements, an observation that appears at odds with their overall conclusions.

      Strengths:

      The multidisciplinary approach using multiple state-of-the-art tools is supported by a rigorous experimental design and strong statistical analyses. The authors have been highly responsive to previous critiques. Concerns of another reviewer regarding the confound between arousal and movement have been addressed by new pupillometry data as a measure of arousal and multivariate analyses to distinguish between the contributions of arousal vs. movement to hypocretin/orexin neuron activity. The new data in Figure 2H added in response to a suggestion by Reviewer 3 particularly strengthens the paper.

      Weaknesses:

      Reviewer 2 mentioned that previous studies using orexin antagonists in rodents have largely found inconsistent effect of antagonizing orexin signaling on simple motor activity and points out that these studies are not referenced here. The authors respond that "orexin antagonism - or optogenetic silencing of HONs - evokes either reduced locomotion, or no effect on locomotor movements" and add references to paragraph 4 of the Discussion. Aside from the fact that 2 of the 3 references added are from the senior author, none address the fact that orexin antagonists induce sleep and that optogenetic silencing of these cells creates a condition where sleep can ensue with short latency - results that certainly affect body movement/locomotor activity.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Tesmer and colleagues uses fiber photometry recordings, sophisticated analysis of movement, and deep learning algorithms to provide compelling evidence that activity in hypothalamic hypocretin/orexin neurons (HONs) correlates with net body movement over multiple behaviors. By examining projection targets, the authors show that hypocretin/orexin release differs in projection targets to the locus coeruleus and substantia nigra, pars compacta. Ablation of HONs does not cause differences in the power spectra of movements. The movement-tracking ability of HONs is independent of HON activity that correlates with blood glucose levels. Finally, the authors show that body movement is not encoded to the same extent in other neural populations.

      Strengths:

      The major strengths of the study are the combination of fiber photometry recordings, analysis of movement in head-fixed mice, and sophisticated classification of movement using deep learning algorithms. The experiments seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      The weaknesses are minor, mostly consisting of writing and data visualization throughout the manuscript. To some degree, it is already known that hypocretin/orexin neurons correlate with movement and arousal, although this manuscript studies this correlation with unprecedented sophistication and scale. It is also unfortunate that most of the experiments throughout the study were only performed in male mice. Taken together, this study is likely to be impactful to the field and our understanding of HONs across behavioral states.

      We agree that disentangling movement from arousal is an important aspect, and in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity). In addition, we now implement many of the reviewer’s recommendations regarding writing, data presentation, and visual clarity (see our replies in the “recommendations for authors” section).

      Reviewer #1 (Recommendations for the authors):

      Some recommendations for the authors:

      (1) The first sentence of the Introduction states: "Neural activity related to body movement recently received much attention." I would rephrase or clarify this statement, as neuroscientists have been studying neural activity related to body movement for decades.

      The reviewer is correct. Our intention was to highlight the resurgence of movementrelated neurosciences enabled by modern techniques such as deep learning applied to video data (e.g. DeepLabCut, etc). The passage has been updated for clarity.

      (2) The Introduction also states that HONs orchestrate "consciousness and arousal." I would delete the word "consciousness," as consciousness represents a lofty, global concept that is challenging to define and quantify in humans, let alone mice.

      We used the word consciousness to be consistent with current literature on the function of the mouse hypothalamus (e.g. Nat Neurosci 2016 Feb;19(2):290-8). But we agree it is not necessary here, and so we followed the advice to delete it.

      (3) The authors state that HON dynamics were recorded while mice were head-fixed while on a running wheel. For clarity, it would be helpful to visualize this head-fixation in Figures 1A and 5B. It would also be helpful to clarify how certain behaviors (e.g. grooming, chewing) were performed and recorded while the mouse was head-fixed.

      In the revised manuscript, updated graphics with a head-fixed mouse have now been added to relevant figures. Representative RGB frames (colors representing sequential frames) of each behaviour have been added to Figure 2A.

      (4) In the legend for Figure 1A, the reference to Gonzalez et al. 2016 seems out of place (at least the reader should be informed why the text is referring to this previous study). Additionally, because the references are ordered by number instead of alphabetically, it would be more helpful to refer to a numbered reference rather than a name.

      Gonzalez et al. 2016 references the source of the AAV construct used in this figure. This has been moved to the methods. Following eLife formatting guidelines, references will be alphabetized upon publication.

      (5) In Figure 3F, it would be helpful to show visual validation that the HON-DTR method indeed ablates all HONs. This is depicted conceptually, but representative figures would be much more convincing.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B.

      Reviewer #2 (Public review):

      Summary:

      Despite several methodological strengths, the major and highly significant drawback is the confound of arousal with movement. This confound is not resolved, so the results could be explained by previously established relationships between orexin and arousal/wakefulness.

      This an excellent point, and we agree. To address this directly in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity).

      Strengths:

      The authors show that orexin neuron activity is associated with body movement and that this information is conveyed irrespective of the fasted state. They also report differences in different orexin target brain regions for orexin release during movement. This paper contains an impressive array of cutting-edge techniques to examine a very important brain system, the orexin-hypocretin system. The authors offer an original perspective on the function of this system. The authors showed that orexin neuron activity scales to some degree with the magnitude of body movement change; this is unaffected by a fasted state and seems to be somewhat unique to orexin neurons.

      The investigation of other genetically defined subcortical neuron populations to determine the specificity of findings is also a strength, as is the ability to quantify movement and use deep learning to classify specific behaviors adds sophistication to analysis. The authors also show heterogeneity in orexin projections to specific target nuclei, which is interesting.

      The authors "speculate that narcolepsy-cataplexy, caused by HON loss-of-function, is perhaps explained by oscillations into unwanted sleep-states and motor programs due to impaired control loops for wakefulness and movement". This is quite an interesting aspect of their work and deserving of further study.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      Despite the strengths, there are several major and minor weaknesses that detract significantly from the study.

      My main concern with this work is the confound of arousal with movement so that correlations with one might reflect a relationship instead with the other. The orexin system is well known to play an important role in arousal, with elevated activity of orexin neurons reported for waking and high arousal. Orexin signaling has also been strongly associated with motivation, which also is associated with arousal and movement. The authors offer no compelling evidence that the relationships they describe between different movements and orexin signaling do not simply reflect the known relationship between arousal and motivation.

      The authors could address this concern by including classical arousal measurements, eg, cortical EEG recorded simultaneously with movements. Often, EEG arousal occurs independently of movement, so this could provide one approach to disentangling this confound. The idea that orexin signaling plays a role in arousal rather than movement is supported by their finding that orexin lesions using the orexin-DTR mouse model did not impact movements. In contrast, prior lesion and pharmacologic studies have found that decreased orexin signaling significantly decreases arousal and waking.

      Another way they could test their idea would be to paralyze and respirate animals so that orexin activity could be recorded without movement. Alternatively, animals could be trained to remain motionless to receive a reward. Thus, there are several ways to test the overall hypothesis of this work that have not been examined here.

      The authors propose that "a simple interpretation of their results is that, via HON movement tracking, the brain creates a "wake up" signal in proportion to movement". This seems to argue for the role of the orexin system in arousal and motivation rather than in movement per se.

      Thank you. We agree that disentangling between arousal and movement is indeed critical. A classic approach is a multivariate analysis, wherein multiple simultaneously recorded “predictors” of HON activity – such as arousal and movement - can be directly compared. While EEG arousal is an option, another well-accepted metric for arousal is pupil diameter. Using n = 7 mice, we now simultaneously record HON activity, movement, running speed, pupil size fluctuations, and ocular movements:

      We then fit a partial least squares multivariate regression (a regression type more robust to collinearity) using the movement metric, pupil size, and ocular movements as predictors of orexin neuron activity. Consistent with previous publications, we found that pupil size alone has a positive correlation with hORX.GCaMP6s (~0.45). However, using a drop-one feature analysis in multivariate regression, we found that movement had the highest % contribution to statistically explaining orexin neuron activity. Here are the new results (which we now added as Fig. 7A-B).

      Author response image 1.

      Furthermore, we also expanded this analysis to incorporate the different frequencies found in HON dynamics, using empirical mode decomposition. We found that pupil size had a maximum correlation at lower HON frequencies than the movement metric, while ocular movements were maximally correlated in higher frequencies (now added as Fig. 7D,E).

      Overall, this analysis suggests that – while HONs encode both movement and arousal – arousal and movement do not always co-fluctuate at the same timescales, and their impacts on HONs can be disentangled in a number of ways. We now mention this in revised text on page 5.

      There are several studies that have examined the effect of orexin antagonist treatment in rodents on locomotor and other motor activities. These studies have largely found no consistent effect of antagonizing orexin signaling, especially at the OxR1 receptor, on simple motor activity. These studies are not referenced here but should be taken into account in the authors' conclusions.

      We agree. Prior studies found that orexin antagonism – or optogenetic silencing of HONs – evokes either reduced locomotion, or no effect on locomotor movements. We now added text and references to paragraph 4 of Discussion, summarising this.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture of HONs ablation is necessary, including pictures of HONs outputs ablation within the SNc and LC.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B. Because HONs are only found in the hypothalamus, somatic deletion of HONs in this region will result in axonal degradation in output regions.

      The discussion lacks a more extensive paragraph on the distinct signal and role of Ox>SNc and Ox-LC projections.

      We now added sentences discussing potential implications of this to Discussion (middle of paragraph 4).

      Reviewer #2 (Recommendations for the authors):

      Minor weaknesses

      A very important movement in rodents is head orientation, especially given the limitation in ocular movement. However, this paper used a fixed head model which obviated this movement and did not attempt to analyze ocular movements.

      Analysing ocular movements is something we had not considered but is very easy to check using pupillometry. In n = 7 mice, we recorded both orexin neurons, and ocular movements captured through an infrared camera under constant lighting. Ocular movements had a small positive correlation with orexin neuron photometry (r = ~0.26). See response to the public review above.

      Author response image 2.

      The "HON" abbreviation is not commonly used for orexin neurons, and I suggest replacing that with a more well-known abbreviation.

      To the best of our knowledge, there is no universally agreed or best-known abbreviation for hypocretin/orexin neurons (we agree it would be nice if there was one!). “HONs” is a simple first letter abbreviation of hypocretin/orexin neurons, which acknowledges the two names for this peptide given by the original discoverers (de Lecea et al, and Sakurai et al, in 1998). Although this may not be the perfect abbreviation, we have kept it for now, also to be consistent with the large number (>10) of other published studies that recently used this abbreviation.

      The graphs showing Pearson's r values do not demonstrate a very strong correlation between neural activity and movement change; they also lack validation of genetic expression/ablation in some cases. The results would more strongly support the conclusions if statistically significant correlations could be demonstrated between activity and movement.

      We agree that a correlation of ~0.68 is probably not worthy of a “very strong” classification. While there is no universal ruleset for categorizing the strength of a correlation, we have toned down our language throughout the manuscript.

      Comment regarding statistical testing of correlations: we are cautious to stand behind correlation significance testing for large sample sizes (~48’000 photometry & video samples in a 40-minute session). In our case, correlations were always extremely significant p<0.0001. The reason for this is that correlation p-values become “too big to fail” (see Lin et al. 2013) with inflated sample size. We therefore refrain from commenting on p-values and rather report between or within-subjects statistical tests, or tests against zero. See four example experiments below.

      Author response image 3.

      Citation: Lin, M., Lucas, H. C., Jr & Shmueli, G. Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research 24, 906–917 (2013).

      The rationale for looking at running speed, general movement, and specific types of nonlocomotor movements could be clarified and explained more thoroughly in the introduction. Why is it important to distinguish between locomotion (represented here with running) and all other movements? Presumably, this is because orexin is known to regulate arousal/locomotion. What evidence is there for orexin's role in other types of movements, which are being grouped together in Figure 1? This could be laid out in more detail in the Introduction. Relatedly, it is not very clear in the text whether the correlation between movement and orexin neuron activity includes movement related to running.

      The main focus of our paper is on movement in general (i.e. video pixel difference, described in Results and Methods). This movement metric includes everything captured by the video, it is agnostic to the type of movement or behaviour.  To connect this to some of the specific innate movements/behaviours typically studied in mouse literature (running, grooming, sniffing, etc), we also performed plots in Figure 2. We attempted to explain this better in revised section 1 of Results.

      What exactly is being correlated in Figure 1C (and throughout the rest of the paper?) Is this the average signal correlated with the average movement change over the entire recording time? This could be more explicitly stated in methods/results. The correlations themselves/p-values could be shown in addition to/instead of Pearson's r values. Are the correlations themselves significant? This would strengthen the claim that orexin activity is strongly coupled to the magnitude of body movement change. As another example, in Figure 2D, there are no statistics reported on the correlation between movement metric and average neural signal. In Figure 6G, orexin neuron activity is more strongly correlated with movement than MVe glut neurons, but are either of these correlations significant? The correlation between MVe glut activity and movement overall seems similar to that of orexin neurons, and may be worth noting more explicitly.

      Throughout the paper, we have recorded both neural activity (photometry) and movement at 20 Hz. This would generate, for example, 48’000 samples of photometry and movement from a 40-minute session. All the samples were used to calculate a pearson’s r between variables. To clarify this, we now added the subtext “wholesession” to relevant figures, as well as a clarification in the methods.

      Individual experiment correlations for orexin neurons and MVe glut neurons were always significant p<0.0001, even after a Bonferroni multiple comparisons correction was applied to each population. See the “too big to fail” nature of correlation hypothesis testing above.

      It could be made clearer at the end of Figure 2 that orexin neuron activity is tracking the magnitude of movement change (shown in Figure 2D), not that it is encoding different types of movement.

      We intended for original Figure 2E to illustrate this concept, however this panel has caused a great deal of confusion to several readers and was perhaps ill conceived. We have replaced Figure 2E with a new panel more directly addressing the reviewer’s statement. We can construct three models where orexin neuron activity is predicted from the behavioral classification (sometimes called “one-hot” encoding) and/or the movement metric.

      Model 1 predicts orexin neuron activity using only a categorical predictor of behavioral state. Model 2 only uses the movement metric, and model 3 allows a different movement-metric correlation within each behavioral state. We can compare these models using AIC (Akaike Information Criterion) which is a point estimate. While the most complex model 3 was the best, model 2 was much closer to model 3 than model 1. Similarly, model 2 was much better than model 1. From this we conclude that the magnitude of movement change is a more powerful predictor than behavioral state (“type of movement”). This is now Figure 2E.

      It would be interesting to see the raw movement metric data as shown in Figures 1 and 2 in the DTR mice to show that ablating orexin neurons does not impair the movement profile seen in Figures 1 and 2.

      The requested visualization has been added to Figure 4B.

      Validation that orexin was selectively ablated in these mice would be ideal.

      Histology (see response to public review) was added to a new Figure 4B.

      Figure 4A - OxLight expression in SNc does not look very robust.

      Please note this is a membrane-targeted indicator, the staining this produces is thus much weaker than cyctosolic indicators such as calcium indicator GCaMP.

      Figure 4 - It would be beneficial to see the same correlations that were done in Figures 1 and 2 to show OxLight activity vs. movement metric. Are they correlated?

      Individual traces had significant correlations with OxLight and movement, and the population averages revealed similar trends:

      Author response image 4.

      Figure 6B - Targeting of MVe neurons does not look very specific. The sample size for orexintargeted mice should be re-stated in the figure legend for clarity.

      Legend has been updated to clarify n = 15 for orexin targeted mice.

      Some citations didn't seem to match what was being referenced in the text. Similarly, in the legend for Figure 1C, the statistics do not match what is reported in the text. In Figure 1, the sample size is not noted in the text. When referring to running in Figure 1, is this referring to running speed? Perhaps the language could be more consistent.

      These typos (due to a rounding error) in the legend and text have been corrected. Sample size has been added to the text, and we have changed Figure 1D to clarify we are referring to running speed. We moved some citations to improve clarity.

      Methods - where were Cre mice obtained from?

      Sources now better referenced in Methods (JAX or Parlato et al).

      Figure 1, panel C: The authors compared Pearson's r-coefficient results for each animal and for each variable. However, it would be interesting to show the correlation curves for each variable. However, it would be interesting to show the correlation curves for each variable as well here. Also, there is mention of a strong correlation but it is unclear whether these correlations are significant.

      See below for an example mouse.

      Author response image 5.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture orexin ablation is necessary, including pictures of orexin fibers ablation within the SNc and LC.

      See our reply to the public review above.

      Figure 5, Panel A: Same comment as Figure 1, panel C.

      We have similarly clarified the panel and legend.

      Page 4: The authors mention "Within the 1st and 4th quartile of blood glucose, movement-HON correlations were not significantly different. Please add the figures.

      The requested plot has been added to Figure 6, panel G.

      Reviewer #3 (Public review):

      Summary

      The study presents an investigation into how hypothalamic orexin neurons (HONs) track body movement with high precision. Using techniques including fiber photometry, video-based movement metrics, and empirical mode decomposition (EMD), the authors demonstrate that HONs encode net body movement consistently across a range of behaviors and metabolic states. They test the ability of HONs to track body movement to that of other subcortical neural populations, from which they distinguish HONs activity from other subcortical neural populations.

      Strengths:

      The study characterizes HONs activity as key indicators of movement and arousal, and this method may have potential implications for understanding sleep disorders, energy regulation, and brain-body coordination. Overall, I think this is a very interesting story, with novel findings and implications about sensorimotor systems in animals. The manuscript is clearly written and the evidence presented is rigorous. The conclusions are well supported by experimental data with clear statistical analyses.

      We thank the reviewer for their supportive feedback.

      Weaknesses/suggestions:

      There are a couple of issues I think the authors could address to make the paper better and more complete:

      (1) The study primarily focuses on steady-state behaviors. It would be interesting if the authors' current dataset allows analyses of HON dynamics during transitions between behavioral states (e.g., resting to running or grooming to sniffing). This could provide additional insights into how HONs adapt to rapid changes in body movement.

      This is a fantastic idea, and easy to check using our classification CNN. We identified the six most frequent behavioral transitions and plotted them in Figure 2H. HONs show rapid dynamics in activity aligned with behavioral changes.

      These changes are very similar to the movement magnitude along these transitions, which is now also plotted in Figure 2G.

      (2) Given the established role of HONs in arousal and wakefulness, the study could further investigate how movement-related HON dynamics interact with arousal states. For example, does HON encoding of movement differ during sleep versus wakefulness?

      To further investigate how movement encoding interacts with arousal, we now include quantification and analysis of pupil-linked arousal (see new Figure 7). We agree it would be interesting to look at what happens during sleep, especially REM sleep when some HONs are thought to be active where there is no/little body movement, but this is beyond the scope of the present study.

      (3) Although HON ablation experiments suggest that HONs do not shape movement frequency profiles. It would be more compelling if the authors could investigate whether HONs contribute to specific types of movements (e.g., fine motor vs. gross motor movements) or modulate movement initiation thresholds.

      We performed this analysis using the k-means classifier for small/large movements. Consistent with previous results, we found no significant effect (p = 0.2767) of genotype on the frequency of identified small (fine) or large (gross) movement clusters. This plot has been added to Figure 4E.

      (4) The heterogeneous movement-related orexin dynamics observed in the LC and SNc raise intriguing questions about the circuit-level mechanisms underlying these differences. Optogenetic or chemogenetic manipulation of these projections could validate the functional implications of these dynamics.

      We agree. We now discuss some implications of this in revised Discussion (paragraph 4). Please note that previous work already demonstrated that orexin action in the SNc can produce locomotion (referenced in the paragraph), though we agree that further work would be valuable.

      Reviewer #3 (Recommendations for the authors):

      Additional feedback:

      (1) Figure 1C: the individual data points are hard to track or see. Consider using a larger marker face to help data visualization. Similar issues can be found in Figures 2C, 2E, 5E, 6C, 6F, and 6G.

      Thickness of the lines and scatterplots have been increased.

      (2) First Section of Results: the authors claim to use a deep-learning network to automatically classify video recordings into five distinct behaviors. However, several issues need to be addressed here:

      a. In Results, the corresponding sentence lacks a reference to the Methods Section.

      Reference has been added to the text.

      b. In Methods, the description of the CNN model is quite limited, lacking many basic, necessary components including necessary references to published papers, the model training, characterization (only an overall accuracy is not enough), as well as dataset definition, preparation, augmentation (if any), etc.

      We have expanded the methods section regarding the CNN model.

      (3) First Section of Results: in the second paragraph, the authors claim that "Overall, these results reveal HON population activity precisely tracks a general degree of body movement across recorded behaviors." This is not accurate. To indicate that HONs activity tracks the general degree of body movement across behavior states, they need to further show that behavioral states with similar levels of movement metrics can be differentiated via HON activities. However, as they showed in Figure 2D, some behaviors with similar values of movement metric do not seem to be easily discerned by HON activity levels.

      We agree with you, and this is also what we originally intended to convey – now reworded for clarity.

      (4) Technical issue: Figures 3B, 3C, 3G, using local regression to plot the solid lines makes them touch negative values, which does not make sense for "power proportion" (this quantity is always non-negative).

      This is a good point. To fix this, we first log-transformed the power metric, then performed a local regression, and used the link function to transform the model predictions back to %-units for visualization. This has been noted in the methods.

      (5) Figure 3G: For a better comparison, consider combining the two plots into a single plot.

      The two plots have been merged as shown in Figure 4C.

      (6) Figure 5E: For a better data visualization, the current pair of plots can be consolidated into one single plot where the x-axis is Move and the y-axis is dGlu. In this way, it is easier to understand and the orthogonality as claimed in the manuscript can be more apparent.

      The requested plot has been added as Figure 6F.

    1. eLife Assessment

      This manuscript describes a novel approach for assessing cognitive function in freely moving mice in their home-cage, without human involvement. The authors provide convincing evidence in support of the tasks they developed to capture a variety of complex behaviors and demonstrate the utility of a machine learning approach to expedite the acquisition of task demands. This work is important given its potential utility for other investigators interested in studying mouse cognition.

    2. Reviewer #1 (Public review):

      Summary:

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory.

      Strengths:

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching.

      Weaknesses:

      I find no major problems with this report.

      Comments on revisions:

      My concerns have been addressed now.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice.

      Strengths:

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high-throughput procedure (without the need of human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach, as mice will develop odd strategies when given complete freedom.

      Weaknesses:

      A limitation to this approach is that it requires mice to be individually housed for days to months. This is now adequately addressed in the discussion.

      A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). The authors now provide information regarding task engagement of the mice across a 24 hour cycle (e.g., trials started, trials finished across a 24 h period).

      Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. The new videos adequately address these concerns.

      The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of impact and significance of this paper would go down significantly. This information is now available to readers.

      Minor concerns

      Learning rate is confusing for Figure 3 results as it actually refers to trials to reach criterion, and not the actual rate of learning (e.g., slope). This has been modified in the manuscript.

      Comments on revisions:

      The authors have addressed all my concerns regarding this very exciting manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task.

      Strengths:

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead we may need to work creatively to meet mice where they live. In some cases it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but that the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning.

      Weaknesses:

      Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals which have been trained via a method that aims to make their behavior as similar as possible is a strength.

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors recognize that their approach is currently optimized for testing within-subjects questions, but begin to show how between-subjects questions might be addressed with this system.

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced choice task somewhat more rapidly than the males in their cohort, and the authors suggest that future work with this system could be used to uncover strategies that differ across individuals.

      Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select for the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice, because it led to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example, by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory. 

      Strengths: 

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching. 

      Weaknesses: 

      I find no major problems with this report. 

      Minor weaknesses: 

      (1)  Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training? 

      Thanks for the comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance, as you mentioned, improved over time (Fig. 2D and 2E), leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H middle). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training. To make the statement more clearly, we indicated the corresponding figure numbers in the text.

      Results “… As shown in Fig. 2H, autonomous training yielded significantly higher number of trial/day (980 ± 25 vs. 611 ± 26, Fig. 2H left) and more volume of water consumption/day (1.65 ± 0.06 vs. 0.97 ± 0.03 ml, Fig. 2H middle), which resulted in monotonic increase of body weight that was even comparable to the free water group (Fig.2H right). In contrast, the body weight in manual training group experienced a sharp drop at the beginning of training and was constantly lower than autonomous group throughout the training stage (Fig. 2H right).”

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      Thanks for the suggestion. The labels with corresponding colors for x-axis have been added for Fig. 2J.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      Thanks for the suggestion. The legend has been added for Fig. 2K.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better. 

      We thank the reviewer for the valuable suggestion. The phrase has been changed to ‘predicted by’ for better suitability.

      Figure S2 “(D), percentage of trials significantly predicted by different regressors during task learning. …”

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice. 

      Strengths: 

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom. 

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth. 

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the manuscript, we add a section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      Discussion “… Firstly, our experiments were confined to single-housed mice, which is known to influence murine behavior and physiology, potentially affecting social interaction and stress levels [76]. In our study, individual housing was necessary to ensure precise behavioral tracking, eliminate competitive interactions during task performance, and maintain consistent training schedules without disruptions from cage-mate disturbances. However, the potential of group-housed training has been explored with technologies such as RFID [28,29,32–34] to distinguish individual mice, which potentially improving the training efficiency and facilitating research of social behaviors [77]. Notably, it has shown that simultaneous training of group-housed mice, without individual differentiation, can still achieve criterion performance [25].”

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals. 

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks.

      In our original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K), which revealed two peaks of engagements during the dark cycle with relatively fewer trials during the light cycle. To facilitate analyses requiring consistent trial durations, we defined trial blocks as sequences between two no-response trials. Notably, approximately 66.6% of trials occurred within blocks of >5 consecutive trials (Fig. 2L), which may be particularly suitable for such analyses.

      In the revised manuscript, we also added the analysis of the histogram of inter-trial-interval for both the autonomous and manual training paradigms in HABITS (Fig. S2H), which shows that around 55.2% and 77.5% of the intervals are less than 2 seconds in autonomous and manual training, respectively.

      Results “… We found more than two-third of the trials was done in >5-trial blocks (Fig. 2L left) which resulted in more than 55% of the trials were with inter-trial-interval less than 2 seconds (Fig. S2H).”

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we observed that manual training (i.e., manually transferring to HABITS for ~2 hr/day) in Fig. S2H resulted in more trials with short inter-trial-interval, suggesting that constrained access time promotes task engagement and reduces interval variability. Fig. 2L also indicated that the averaged correct rate increased and the earlylick rate decreased as the length of block increased. This approach could be valuable for studies where consistent trial timing is critical. In the context of our study, we could actually introduce a light, for example, to serve as the cue that prompt the animals to engage during a fixed time duration in a day.

      Discussion “… In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not. 

      Thanks for the reminder. We have added subtitles to both of the videos. Since the supplementary video1 was not recorded with sound, the correctness of the trials was hard to judge. We replaced the video with another one with clear sound recordings, and the subtitles were commented in detail.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly. 

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns: 

      (5) Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. The ‘learning rate’ which refers to trial number to reach criterion has been changed to ‘the number of trials to reach criterion’.

      Reviewer #3 (Public review): 

      Summary: 

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task. 

      Strengths: 

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning. 

      Weaknesses: 

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength. 

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome. 

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models. We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided did show preliminary evidences for the ‘between-subject’ questions. Key observations include:

      The large variability in learning rates among mice observed in Fig. 2I;

      The overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G);

      The varying nocturnal behavioral patterns (Fig. 2K), etc.

      We recognize the value of exploring between-subjects differences in mouse model and discussed more details in the Discussion part.

      Discussion “Our study was designed to standardize behavior for the precise interrogation of neural mechanisms, specifically addressing within-subject questions. However, investigators are often interested in between-subject differences—such as sex differences or genetic variants—which can have long-term behavioral and cognitive implications [72,74]. This is particularly relevant in mouse models due to their genetic tractability [75]. Although our primary focus was not on between-subject differences, the dataset we generated provides preliminary evidence for such investigations. Several behavioral readouts revealed individual variability among mice, including large disparities in learning rates across individuals (Fig. 2I), differences in overall learning rates between male and female subjects (Fig. 2D vs. Fig. S2G), variations in nocturnal behavioral patterns (Fig. 2K), etc.”

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We included these comments in the Discussion part for further clearance.

      Discussion “… Furthermore, a detailed logistic regression analysis dissected the strategies mice employed during training (Fig. S2B). Notably, the regression identified variables associated with individual task-performance strategies (Fig. S2C), which also differed between manually and autonomously trained groups (Fig. S2D). Thus, our system could facilitate high-throughput behavioral studies exploring between-subject differences in the future.”

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trials (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

      Discussion “… Lastly, while HABITS achieves criterion performance in a similar or even shorter overall days compared to manual training, it requires more trials to reach the same learning criterion (Fig. 2G). We hypothesize that this difference in trial efficiency may stem from the constrained engagement duration imposed by the experimenter in manual training, which could compel mice to focus more intensely on task execution, resulting in less trial omissions (Fig. 2F). In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement 83 and inter-trial-intervals, which could be problematic for data analysis relaying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.”

      Reviewer #2 (Recommendations for the authors):

      As I mentioned in the weaknesses, I did not see code or CAD drawings for their home cages and how these interact with a computer.

      Thanks for the comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS.

    1. eLife Assessment

      This important study explores the power of computational methods to predict lifespan-extending small molecules, demonstrating that while these methods significantly increase hit rates, experimental validation remains essential. The study uses all-trans retinoic acid in Caenorhabditis elegans as a model, providing genetic and transcriptomic insights into its longevity effects. The data are compelling in describing a robust, computationally informed screening process for discovering compounds that extend lifespan in this species.

    2. Reviewer #1 (Public review):

      Summary:

      This study highlights the strengths of using predictive computational models to inform C. elegans screening studies of compounds' effects on aging and lifespan. The authors primarily focus on all-trans retinoic acid (atRA), one of the 5 compounds (out of 16 tested) that extended C. elegans lifespan in their experiments. They show that atRA has positive effects on C. elegans lifespan and age-related health, while it has more modest and inconsistent effects (i.e., some detrimental impacts) for C. briggsae and C. tropicalis. In genetic experiments designed to evaluate contributing mediators of lifespan extension with atRA exposure, it was found that 150 µM of atRA did not significantly extend lifespan in akt-1 or akt-2 loss-of-function mutants, nor in animals with loss of function of aak-2, or skn-1 (in which atRA had toxic effects); these genes appear to be required for atRA-mediated lifespan extension. hsf-1 and daf-16 loss-of-function mutants both had a modest but statistically significant lifespan extension with 150 µM of atRA, suggesting that these transcription factors may contribute towards mediating atRA lifespan extension, but that they are not individually required for some lifespan extension. RNAseq assessment of transcriptional changes in day 4 atRA-treated adult wild type worms revealed some interesting observations. Consistent with the study's genetic mutant lifespan observations, many of the atRA-regulated genes with the greatest fold-change differences are known regulated targets of daf-2 and/or skn-1 signaling pathways in C. elegans. hsf-1 loss-of-function mutants show a shifted atRA transcriptional response, revealing a dependence on hsf-1 for ~60% of the atRA-downregulated genes. On the other hand, RNAseq analysis in aak-2 loss-of-function mutants revealed that aak-2 is only required for less than a quarter of the atRA transcriptional response. All together, this study is a proof of the concept that computational models can help optimize C. elegans screening approaches that test compounds' effects on lifespan, and provides comprehensive transcriptomic and genetic insights into the lifespan-extending effects of all-trans retinoic acid (atRA).

      Strengths:

      A clearly described and well-justified account describes the approach used to prioritize and select compounds for screening, based on using the top candidates from a published list of computationally ranked compounds (Fuentealba et al., 2019) that were cross-referenced with other bioinformatics publications to predict anti-aging compounds, after de-selecting compounds previously evaluated in C. elegans as per the DrugAge database. 16 compounds were tested at 4-5 different concentrations to evaluate effects on C. elegans lifespan.

      Robust experimental design was undertaken evaluating the lifespan effects of atRA, as it was tested on three strains each of C. elegans, C. briggsae, and C. tropicalis, with trial replication performed at three distinct laboratories. These observations extended beyond lifespan to include evaluations of health metrics related to swimming performance.

      In-depth analyses of the RNAseq data of whole-worm transcriptional responses to atRA revealed interesting insights into regulator pathways and novel groups of genes that may be involved in mediating lifespan-extension effects (e.g., atRA-induced upregulation of sphingolipid metabolism genes, atRA-upregulation of genes in a poorly-characterized family of C. elegans paralogs predicted to have kinase-like activity, and disproportionate downregulation of collagen genes with atRA).

      Weaknesses:

      The authors' computational-based compound screening approach led to a ~30% prediction success rate for compounds that could extend the median lifespan of C. elegans. However, follow-up experiments on the top compounds highlighted the fact that some of these observed "successes" could be driven by indirect, confounding effects of these compounds on the bacterial food source, rather than direct beneficial effects on C. elegans physiology and lifespan. For instance, this appeared to be the case for the "top" hit of propranolol. Other compounds were not tested with metabolically inert or killed bacteria to preclude the possibility of bacteria-produced metabolites exerting observed effects; this might be a useful future direction to consider.

      Transcriptomic analyses of atRA effects were extensive in this study, but discussions of potential non-transcriptional effects of key proposed regulators (such as AMPK) were limited. For instance, other outputs of aak-2/AMPK (non-transcriptional changes to metabolic balance, autophagy, etc.) might account for its requirement for mediating lifespan extension effects, since aak-2 was not required for a major proportion of atRA transcriptional responses.

      Comments on revisions:

      In their revisions, the authors resolved all of my initial recommendations, and I have no additional suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Banse et al. experimentally validate the power of computational approaches that predict anti-aging molecules using the multi-species approach of the Caenorhabditis Intervention Testing Program (CITP). Filtering candidate molecules based on transcriptional profiles, ML models, literature searches, and the DrugAge database, they selected 16 compounds for testing. Of those, eight did not affect C. elegans' lifespan, three shortened it, and five extended C. elegans' lifespan, resulting in a hit rate of over 30%. Of those five, they then focused on all-trans-retinoic acid (atRA), a compound that has previously resulted in contradictory effects. The lifespan-extending effect of atRA was consistent in all C. elegans strains tested, was absent in C. briggsae, and a small effect was observed in some C. tropicalis strains. Similar results were obtained for measures of healthspan. The authors then investigated the mechanism of action of atRA and showed that it was only partially dependent on daf-16 but required akt-1, akt-2, skn-1, hsf-1, and, to some degree, pmk-1. The authors further investigate the downstream effects of atRA exposure by conducting RNAseq experiments in both wild-type and mutant animals to show that some, but surprisingly few, of the gene expression changes that are observed in wild-type animals are lost in the hsf-1 and aak-2 mutants

      Strengths:

      Overall, this study is well-conceived and executed as it investigates the effect of atRA across different concentrations, strains, and species, including life and health span. Revealing the variability between sites, assays, and the method used is a powerful aspect of this study. It will do a lot to dispel the nonsensical illusion that we can determine a per cent increase in lifespan to the precision of two floating point numbers.

      An interesting and potentially important implication arises from this study. The computational selection of compounds was agnostic regarding strain or species differences and was predominantly based on observations made in mammalian systems. The hit rate calculated is based on the results of C. elegans and not on the molecules' effectiveness in Briggsae or Tropicalis. If it were, the hit rate would be much lower. How is that? It would suggest that ML models and transcriptional data obtained from mammals have a higher predictive value for C. elegans than for the other two species. This selectivity for C.elegans over C.tropicalis and C.Briggsae seems both puzzling and unexpected. The predictions for longevity were based on the transcriptional data in cell lines. Would it be feasible to compare the mammalian data to the transcriptional data in Fig. 5 and see how well they match? While this is clear beyond the focus of this study, an implied prediction is that running RNAseqs for all these strains exposed to atRA would reveal that the transcriptional changes observed in the strains where it extends lifespan the most should match the mammalian data best. Otherwise, how could the mammalian datasets be used to predict the effects for C.elegans over C.Briggsae or C.Tropicalis have more predictive for one species than the other? There are a lot of IFs in this prediction, but such an experiment would reconsider and validate the basis on which the original predictions were made.

      Weaknesses:

      Many of the most upregulated genes, such as cyps and pgps are xenobiotic response genes upregulated in many transcriptional datasets from C.elegans drug studies. Their expression might be necessary to deal with atRA breakdown metabolites to prevent toxicity rather than confer longevity. Because atRA is very light sensitive and has toxicity of breakdown, metabolites may explain some of the differences observed with the lifespan of machine effects compared to standard assay practices. However, the authors provide a potential explanation for that observation.

      Comments on revisions:

      The authors have adequately addressed my concerns and the paper is suitable for publication.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Banse et al., demonstrate that combining computer prediction with genetic analysis in distinct Caenorhabditis species can streamline the discovery of aging interventions by taking advantage of the diverse pool of compounds that are currently available. They demonstrate that through careful prioritization of candidate compounds, they are able to accomplish a 30% positive hit rate for interventions that produce significant lifespan extensions. Within the positive hits, they focus on all-trans retinoic acid (atRA) and discover that it modulates lifespan through conserved longevity pathways such as AKT-1 and AKT-2 (and other conserved Akt-targets such as Nrf2/SKN-1 and HSF1/HSF-1) as well as through AAK-2, a conserved catalytic subunit of AMPK. To better understand the genetic mechanisms behind lifespan extension upon atRA treatment, the authors perform RNAseq experiments using a variety of genetic backgrounds for cross comparison and validation. Using this current state-of-the-art approach for studying gene expression, the authors determine that atRA treatment produces gene expression changes across a broad set of stress-response and longevity-related pathways. Overall, this study is important since it highlights the potential of combining traditional genetic analysis in the genetically tractable organism C. elegans with computational methods that will become even more powerful with the swift advancements being made in artificial intelligence. The study possesses both theoretical and practical implications not only in the field of aging, but also in related fields such as health and disease. Most of the claims in this study are supported by solid evidence, but the conclusions can be refined with a small set of additional experiments or re-analysis of data.

      Strengths:

      (1) The criteria for prioritizing compounds for screening are well-defined and is easy to replicate (Figure 1), even for scientists with limited experience in computational biology. The approach is also adaptable to other systems or model organisms.

      (2) I commend the researchers for doing follow-up experiments with the compound propranolol to verify its effect of lifespan (Figure 2- figure supplement 2), given the observation that it affected the growth of OP50. To prevent false hits in the future, the reviewer recommends the use of inactivated OP50 for future experiments to remove this confounding variable.

      (3) The sources of variation (Figure 3-figure supplement 2) are taken into account and demonstrates the need for advancing our understanding of the lifespan phenotype due to inter-individual variation.

      (4) The addition of the C. elegans swim test in addition to the lifespan assays provides further evidence of atRA-induced improvement in longevity.

      (5) The RNAseq approach was performed in a variety of genetic backgrounds, which allowed the authors to determine the relationship between AAK-2 and HSF-1 regulation of the retinoic acid pathway in C. elegans, specifically, that the former functions downstream of the latter.

      Weaknesses:

      (1) The authors demonstrate that atRA extends lifespan in a species-specific manner (Figure 3). Specifically, this extension only occurs in the species C. elegans yet, the title implies that atRA-induced lifespan extension occurs in different Caenorhabditis species when it is clearly not the case. While the authors state that failure to observe phenotypes in C. briggsae and C. tropicalis is a common feature of CITP tests, they do not speculate as to why this phenomenon occurs.

      (2) There are discrepancies between the lifespan curves by hand (Figure 3-Figure supplement 1) and using the automated lifespan machine (Figure 3-supplement 3). Specifically, in the automated lifespan assays, there are drastic changes in the slope of the survival curve which do not occur in the manual assays and may be suggestive that confounding factors may still operate or produce additional variation in ALM experiments despite relatively well-controlled environmental conditions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study highlights the strengths of using predictive computational models to inform C. elegans screening studies of compounds' eCects on aging and lifespan. The authors primarily focus on all-trans retinoic acid (atRA), one of the 5 compounds (out of 16 tested) that extended C. elegans lifespan in their experiments. They show that atRA has positive eCects on C. elegans lifespan and age-related health, while it has more modest and inconsistent eCects (i.e., some detrimental impacts) for C. briggsae and C. tropicalis. In genetic experiments designed to evaluate contributing mediators of lifespan extension with atRA exposure, it was found that 150 µM of atRA did not significantly extend lifespan in akt1 or akt-2 loss-of-function mutants, nor in animals with loss of function of aak-2, or skn-1 (in which atRA had toxic eCects); these genes appear to be required for atRA-mediated lifespan extension. hsf-1 and daf-16 loss-of-function mutants both had a modest but statistically significant lifespan extension with 150 µM of atRA, suggesting that these transcription factors may contribute towards mediating atRA lifespan extension, but that they are not individually required for some lifespan extension. RNAseq assessment of transcriptional changes in day 4 atRA-treated adult wild-type worms revealed some interesting observations. Consistent with the study's genetic mutant lifespan observations, many of the atRA-regulated genes with the greatest fold-change diCerences are known regulated targets of daf-2 and/or skn-1 signaling pathways in C. elegans. hsf-1 loss-offunction mutants show a shifted atRA transcriptional response, revealing a dependence on hsf-1 for ~60% of the atRA-downregulated genes. On the other hand, RNAseq analysis in aak-2 loss-of-function mutants revealed that aak-2 is only required for less than a quarter of the atRA transcriptional response. All together, this study is proof of the concept that computational models can help optimize C. elegans screening approaches that test compounds' eCects on lifespan, and provide comprehensive transcriptomic and genetic insights into the lifespan-extending eCects of all-trans retinoic acid (atRA).

      Strengths:

      (1) A clearly described and well-justified account describes the approach used to prioritize and select compounds for screening, based on using the top candidates from a published list of computationally ranked compounds (Fuentealba et al., 2019) that were crossreferenced with other bioinformatics publications to predict anti-aging compounds, after de-selecting compounds previously evaluated in C. elegans as per the DrugAge database. 16 compounds were tested at 4-5 diCerent concentrations to evaluate eCects on C. elegans lifespan.

      (2) Robust experimental design was undertaken evaluating the lifespan eCects of atRA, as

      it was tested on three strains each of C. elegans, C. briggsae, and C. tropicalis, with trial replication performed at three distinct laboratories. These observations extended beyond lifespan to include evaluations of health metrics related to swimming performance.

      (3) In-depth analyses of the RNAseq data of whole-worm transcriptional responses to atRA revealed interesting insights into regulator pathways and novel groups of genes that may be involved in mediating lifespan-extension eCects (e.g., atRA-induced upregulation of sphingolipid metabolism genes, atRA-upregulation of genes in a poorly-characterized family of C. elegans paralogs predicted to have kinase-like activity, and disproportionate downregulation of collagen genes with atRA).

      We thank the reviewer for highlighting the strengths of our paper.

      Weaknesses:

      (1) The authors' computational-based compound screening approach led to a ~30% prediction success rate for compounds that could extend the median lifespan of C.elegans. However, follow-up experiments on the top compounds highlighted the fact that some of these observed "successes" could be driven by indirect, confounding eCects of these compounds on the bacterial food source, rather than direct beneficial eCects on C. elegans physiology and lifespan. For instance, this appeared to be the case for the "top" hit of propranolol; other compounds were not tested with metabolically inert or killed bacteria. In addition, there are no comparative metrics provided to compare this study's ~30% success rate to screening approaches that do not use computational predictions.

      We do test whether compounds have a direct e:ect on bacterial growth. We have the text to clarify that fact. There may be potential lifespan e:ects from atRA due to changes in bacterial metabolites, however exploring that more fully is beyond the scope of the current work. 

      We very much appreciate the question regarding relative success. An appropriate benchmark for “hit rate” is perhaps best provided by Petrascheck, Ye & Buck (2007), who conducted a large-scale screen of 88,000 compounds for e:ects on adult lifespan in C. elegans. They found an initial screening hit rate of 1.2% (1083/88000), which were then retested for a verified hit rate of 0.13% (115/88000), with a retest failure rate of 89% (968/1083). Similarly, Lucanic et al. (2016) screened 30,000 compounds, with an initial hit rate of approximately 1.7% (~500/30000), or these 180 were selected for retesting, resulting in a final verified hit rate of 0.19% (57/29680), which is comparable to the Petrascheck et al. result. The text in the discussion has been modified to include these studies.

      (2)Transcriptomic analyses of atRA eCects were extensive in this study, but evaluations and discussions of non-transcriptional eCects of key proposed regulators (such as AMPK) were limited. For instance, non-transcriptional eCects of aak-2/AMPK might account for its requirement for mediating lifespan extension eCects, since aak-2 was not required for a major proportion of atRA transcriptional responses.

      We naturally agree with the reviewer that non-transcriptional e:ects are possible and well worth pursuing in future work. However, these e:ects will still show within our study, as any upstream non-transcriptional e:ects are likely to reveal themselves in downstream transcriptional changes, as measured here.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Banse et al. experimentally validate the power of computational approaches that predict anti-aging molecules using the multi-species approach of the Caenorhabditis Intervention Testing Program (CITP). Filtering candidate molecules based on transcriptional profiles, ML models, literature searches, and the DrugAge database, they selected 16 compounds for testing. Of those, eight did not aCect C.elegan's lifespan, three shortened it, and five extended C.elegan's lifespan, resulting in a hit rate of over 30%. Of those five, they then focused on all-trans-retinoic acid (atRA), a compound that has previously resulted in contradictory eCects. The lifespan-extending eCect of atRA was consistent in all C. elegans strains tested, was absent in C. briggsae, and a small eCect was observed in some C. tropicalis strains. Similar results were obtained for measures of healthspan. The authors then investigated the mechanism of action of atRA and showed that it was only partially dependent on daf-16 but required akt-1, akt-2, skn-1, hsf-1, and, to some degree, pmk-1. The authors further investigate the downstream eCects of atRA exposure by conducting RNAseq experiments in both wild-type and mutant animals to show that some, but surprisingly few, of the gene expression changes that are observed in wild-type animals are lost in the hsf-1 and aak-2 mutants.

      Strengths:

      Overall, this study is well conceived and executed as it investigates the eCect of atRA across diCerent concentrations, strains, and species, including life and health span. Revealing the variability between sites, assays, and the method used is a powerful aspect of this study. It will do a lot to dispel the nonsensical illusion that we can determine a percent increase in lifespan to the precision of two floating point numbers.

      An interesting and potentially important implication arises from this study. The computational selection of compounds was agnostic regarding strain or species diCerences and was predominantly based on observations made in mammalian systems. The hit rate calculated is based on the results of C. elegans and not on the molecules' eCectiveness in Briggsae or Tropicalis. If it were, the hit rate would be much lower. How is that? It would suggest that ML models and transcriptional data obtained from mammals have a higher predictive value for C. elegans than for the other two species. This selectivity for C.elegans over C.tropicalis and C.Briggsae seems both puzzling and unexpected. The predictions for longevity were based on the transcriptional data in cell lines.

      This is a common observation in the CITP for which we do not currently have a satisfying explanation. For whatever reason, C. elegans is much more responsive to compounds than other species, much like it is more responsive to RNAi and other environmental interventions. It may be less active in detoxifying external agents than the other species, although this is just speculation at the moment. We continue to investigate this question, but that work is beyond the scope of the present paper.

      Would it be feasible to compare the mammalian data to the transcriptional data in Figure 5 and see how well they match? While this is clear beyond the focus of this study, an implied prediction is that running RNAseqs for all these strains exposed to atRA would reveal that the transcriptional changes observed in the strains where it extends lifespan the most should match the mammalian data best. Otherwise, how could the mammalian datasets be used to predict the eCects of C.elegans over C.Briggsae or C.Tropicalis have more predictive for one species than the other? There are a lot of IFs in this prediction, but such an experiment would reconsider and validate the basis on which the original predictions were made.

      These questions are worth pursuing in the future but are beyond the scope of the current work.

      Weaknesses:

      Many of the most upregulated genes, such as cyps and pgps are xenobiotic response genes upregulated in many transcriptional datasets from C. elegans drug studies. Their expression might be necessary to deal with atRA breakdown metabolites to prevent toxicity rather than confer longevity. Because atRA is very light sensitive and has toxicity of breakdown, metabolites may explain some of the diCerences observed with the lifespan of machine eCects compared to standard assay practices.

      This is certainly a possibility, although we often observe longer lifespans on the ALM, perhaps because they themselves are stressful, thereby providing a more sensitive background environment for detecting positive stress response modulators.

      Reviewer #3 (Public review):

      Summary:

      In this study, Banse et al., demonstrate that combining computer prediction with genetic analysis in distinct Caenorhabditis species can streamline the discovery of aging interventions by taking advantage of the diverse pool of compounds that are currently available. They demonstrate that through careful prioritization of candidate compounds, they are able to accomplish a 30% positive hit rate for interventions that produce significant lifespan extensions. Within the positive hits, they focus on all-trans retinoic acid (atRA) and discover that it modulates lifespan through conserved longevity pathways such as AKT-1 and AKT-2 (and other conserved Akt-targets such as Nrf2/SKN-1 and HSF1/HSF-1) as well as through AAK-2, a conserved catalytic subunit of AMPK. To better understand the genetic mechanisms behind lifespan extension upon atRA treatment, the authors perform RNAseq experiments using a variety of genetic backgrounds for cross-comparison and validation. Using this current state-of-the-art approach for studying gene expression, the authors determine that atRA treatment produces gene expression changes across a broad set of stress-response and longevity-related pathways. Overall, this study is important since it highlights the potential of combining traditional genetic analysis in the genetically tractable organism C. elegans with computational methods that will become even more powerful with the swift advancements being made in artificial intelligence. The study possesses both theoretical and practical implications not only in the field of aging but also in related fields such as health and disease. Most of the claims in this study are supported by solid evidence, but the conclusions can be refined with a small set of additional experiments or re-analysis of data.

      Strengths:

      (1) The criteria for prioritizing compounds for screening are well-defined and easy to replicate (Figure 1), even for scientists with limited experience in computational biology. The approach is also adaptable to other systems or model organisms.

      (2) I commend the researchers for doing follow-up experiments with the compound propranolol to verify its eCect on lifespan (Figure 2 Supplement 2), given the observation that it aCected the growth of OP50. To prevent false hits in the future, the reviewer recommends the use of inactivated OP50 for future experiments to remove this confounding variable.

      (3) The sources of variation (Figure 3, Figure Supplement 2) are taken into account and demonstrate the need for advancing our understanding of the lifespan phenotype due to inter-individual variation.

      (4) The addition of the C. elegans swim test in addition to the lifespan assays provides further evidence of atRA-induced improvement in longevity.

      (5) The RNAseq approach was performed in a variety of genetic backgrounds, which allowed the authors to determine the relationship between AAK-2 and HSF-1 regulation of the retinoic acid pathway in C. elegans, specifically, that the former functions downstream of the latter.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      (1) The filtering of compounds for testing using the DrugAge database requires that the database is consistently updated. In this particular case, even though atRA does not appear in the database, the authors themselves cite literature that has already demonstrated atRA-induced lifespan extension, which should have precluded this compound from the analysis in the first place.

      As often happens in science, this work was initiated before Statzer et al. (2021) was published. As such, it is included in the test set.

      (2) The threshold for determining positive hits is arbitrary, and in this case, a 30% positive hit rate was observed when the threshold is set to a lifespan extension of around 5% based on Figure 1B (the authors fail to explicitly state the cut-oC for what is considered a positive hit).

      Any compound that statistically increases lifespan is considered a positive hit by the CITP. The CITP in general is powered to detect minimum e:ect sizes of 5%.

      (3) The authors demonstrate that atRA extends lifespan in a species-specific manner (Figure 3). Specifically, this extension only occurs in the species C. elegans yet, the title implies that atRA-induced lifespan extension occurs in diCerent Caenorhabditis species when it is clearly not the case. While the authors state that failure to observe phenotypes in C. briggsae and C. tropicalis is a common feature of CITP tests, they do not speculate as to why this phenomenon occurs.

      Please see the comment above.

      (4) There are discrepancies between the lifespan curves by hand (Figure 3 Figure Supplement 1) and using the automated lifespan machine (Figure 3 Supplement 3). Specifically, in the automated lifespan assays, there are drastic changes in the slope of the survival curve which do not occur in the manual assays. This may be due to improper filtering of non-worm objects, improper annotation of death times, or improper distribution of plates in each scanner.

      Our storyboarding SOP ensures that discrepancies in the shape of the curve are unlikely to be due to annotation errors. We check every page of the storyboard by hand, so all non-worm objects are excluded. Furthermore, the first and last ~10% of deaths are checked by hand (as we observed that these time points are the most likely to be wrongly called by the software), with a few deaths chosen at random from the middle to ensure that the software is calling death times accurately. If we find a high amount of inaccurately called deaths, the entire plate is annotated by hand. For this specific experiment, 18% of the total deaths were hand annotated. Plates are randomly distributed across each scanner in an e:ort to prevent bias. As noted above, it does appear that the ALM environment and the “by hand” environment are somewhat di:erent.

      (5) The authors miss an opportunity to determine whether the lifespan extension phenotype attributed to the retinoic acid pathway is mostly transcriptional in nature or whether some of it is post-transcriptional. The authors even state "that while aak-2 is absolutely required for the longevity eCects of atRA, aak-2 is required only for a small proportion (~1/4) of the transcriptional response", suggesting that some of the eCects are post-transcriptional. Further information could have been obtained had the authors also performed RNAseq analysis on the tol-1 mutant which exhibited an enhanced response to atRA compared to wild-type animals, and comparing the magnitude of gene expression changes between the tol-1 mutant and all other genetic backgrounds for which RNAseq was performed.

      Reviewer #1 (Recommendations for the authors):

      (1) Will the raw RNA-seq data be publicly deposited? Please clarify. This would strengthen the value of the study.

      All data is available. We have clarified this in the text.

      (2) Since all-trans retinoic acid is a metabolite of vitamin A, it seems important to include a discussion of and reference to the recent study SKN-1/NRF2 upregulation by vitamin A is conserved from nematodes to mammals and is critical for lifespan extension in Caenorhabditis elegans (Sirakawin et al Cell Reports 2024). Sirakawin et al include data that corroborates and expands on the findings of the current study, including the observation that vitamin A reduces whole-body lipid deposition (agrees with some of the transcriptional findings in the current study); that vitamin A protects against oxidative stress; that vitamin A elevates expression of gst-5, skn-1, and pmk-1; and that loss-offunction mutation of skn-1 has similar eCects to the current study, in terms of suppressing lifespan-extending eCects of vitamin A. In addition, adding some discussion of oxidative stress would strengthen this work, in light of widespread perceptions of the antioxidant properties of vitamin A (and its metabolites).

      Thank you for this suggestion. We have added this citation to the discussion.

      (3) Minor typo: Lines 341-342 - After a sentence that contains the phrase "collagen and neuropeptide related genes", the next sentence uses the term "the latter" in reference to the collagen genes (should be "the former").

      Edited in text.

      (4) Minor correction: In Figure 6, the information in the figure legend is swapped for figure panels A) and B).

      Edited in figure caption.

      (5) To me, the subtitle heading "Loss of AMPK leads to a unique transcriptional profile in response to atRA treatment" (Line 403) is misleading, considering the contents of the text in that section, and the data presented in Figure 6.

      We have altered this heading to reflect this comment.

      Reviewer #2 (Recommendations for the authors):

      Using diCerent colors for the diCerent testing sites would make Figure 3 more readable.

      Edited so that each lab is represented by a di:erent shade of green.

      Reviewer #3 (Recommendations for the authors):

      It would be interesting to investigate the eCect of even higher concentrations of atRA as it has been reported that atRA accumulation is associated with deleterious phenotypes in mice (Snyder et al., 2020, FASEB J).

      We tested the highest concentration (150 uM) based on the solubility of the compound using our standardized plate treatment protocol, so we are unable to test higher concentrations.  

      A good first guess for a downstream retinoid receptor is nhr-23 which is the homolog of the vertebrate ROR genes. Stehlin-Gaon et al. (2003, Nat Struct Mol Biol) have shown that atRA is a ligand for the orphan nuclear receptor RORβ. It might be interesting to study the eCects of atRA on an nhr-23::AID (auxin inducible degron) background. This would allow you to circumvent the developmental phenotypes as a result of nhr-23 knockdown. Patrick/Stephen

      A few notes on the text/figures:

      Line 342: I believe the authors meant "former" instead of "latter".

      Corrected in text.

      Line 346: Can you also highlight col-144 in Fig. 5 S1?

      This is not really feasible, as it is in the cluster near the where the axes meet (red arrow).

      Line 400: CUB pathogen - based on Figure 6 Supp 1, this occurs in aak-2 and not in hsf-1.

      Great catch by the reviewer. We have updated the figure with the correct information.

      Line 414: hedgehog-like signaling - occurs in hsf-1 instead of aak-2. Similar inconsistencies occur in lines 415 (sterol), 417 (C-type lectin), and 418 (unassigned pathogens)

      We have updated the text to eliminate potential conflicts/confusion in the presentation here.

      Line 434: I believe the authors meant Figure "6" instead of "7"

      Edited in text.

      Line 475: Is it "fifteen" or "sixteen" compounds initially targeted?

      Edited in text.

      Can you please include the population sizes for the lifespan assays if not yet included in the detailed protocol to be published in FigShare (to which I currently do not have access to)?

      Added “50 animals per petri plate” to Lifespan Assay methods section; additionally, all sample sizes are included as a summary tab in each dataset on figshare.com (10.6084/m9.figshare.c.6320690).

    1. eLife Assessment

      The authors of this important study investigate how telomere length regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter, while short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. There is convincing support for the claims and the findings should be of broad interest for cell biologists and those working in fields where telomeres alter function, such as cancer and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established, strengthening our understanding of telomere biology.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been used.

      Comments on current version:

      The current version of the manuscript has addressed all the reviewers' concerns to the best of its ability. However, understanding the limitations of the authors, exploring ALT cell lines for the current mechanism would be desirable in the future.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      The cell proliferation and morphology of the engineered cells were monitored during experiments. With a doubling time within 16-18 hours, all the cancer cell line pairs used in the study were counted and seeded equally before experiments.

      No significant difference in morphology or cell count (before harvesting for experiments) was noted for the stable cell lines, namely, HT1080 ST-HT1080 LT, HCT116 p53 null scrambled control-HCT116 p53 null hTERC knockdown.

      MDAMB 231 cells which were treated with guanine-rich telomere repeats (GTR) over a period of 12 days, as per the protocol mentioned in Methods. Due to the alternate day of GTR treatment in serum-free media followed by replenishment with serum-supplemented media, we noted that cells would undergo periodic delay in their proliferation (or transient arrest) aligning with the GTR oligo-feeding cycles and appeared somewhat larger in comparison to their parental untreated cells.

      Next, the cells with Cas9-telomeric sgRNA mediated telomere trimming were maintained transiently (till 3 days after transfection). During this time, no significant change in morphology or cell proliferation was observed in any of the cell lines, namely HCT116 or HEK293T Gaussia Luciferase reporter cells. iPSCs were also monitored. However, no change in morphology or cellular proliferation was observed during the 5 days post-transfection and antibiotic selection.  

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      The reviewer correctly highlights (as we also acknowledge in the Discussion) that our study primarily utilizes engineered cell lines with artificially elongated or shortened telomeres. We agree that using ALT-negative cells with naturally short versus long telomeres would provide additional perspective. However, a key challenge in this experimental setup is the inherent variation in TRF2 protein levels among these cell types—a parameter central to our hypothesis. Comparing observations across such non-isogenic cell line pairs presents experimental limitations as these would require extensive normalization for multiple factors and introduce additional complexities, which would be difficult to interpret with clarity.

      We had also explored primary cells, specifically foreskin fibroblasts and MRC5 lung fibroblasts, as suggested by the reviewer. However, we encountered two significant challenges. To achieve a notable telomere length difference of at least 20%, these primary cells had to undergo a minimum of 25 passages. During this period, we observed a substantial decline in their proliferation capacity and an increased tendency toward replicative senescence. Additionally, we noted a significant reduction in TRF2 protein levels as the primary cells aged, consistent with findings from Fujita K et al., 2010 (Nat Cell Biol.), which reported p53-induced, Siah-1-mediated proteasomal degradation of TRF2. Due to these practical limitations, we focused on cancer cell lines with respective isogenic backgrounds, ensuring a controlled experimental framework. On the other hand, this opens new avenues for future research to explore broader implications. Investigating other primary cell types that may not present these challenges could be a valuable direction for future studies.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      In this study, we utilized a Doxycycline-inducible hTERT expression system to modulate telomere length in cancer cells, aiming to capture any gradual changes that might occur upon steady telomerase induction or overexpression—an event frequently observed in cancer progression. We monitored telomere length and telomerase activity at regular intervals (Supplementary Figure 2), noting a gradual increase until a characteristic threshold was reached, followed by a reversal to the initial telomere length.

      While this model provides interesting insights in context of cancer cells, it does not replicate the conditions of aging or therapeutic intervention. We agree that exploring telomere length-dependent regulation of hTERT in normal aging cells is an important avenue for future research. Investigating TRF2 occupancy on the hTERT promoter in response to telomere length alterations through therapeutic interventions—such as telomestatin or imetelstat (telomerase inhibitors) and 6-thio-2’-deoxyguanosine (telomere damage inducer)—would provide valuable insights and warrants further exploration.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      In our previous study (Sharma et al., 2021, Cell Reports), we have experimentally demonstrated that GABPA and TRF2 do not compete for binding at the mutant hTERT promoter (Figure 4M-R). Silencing GABPA in various mutant hTERT promoter cells did not increase TRF2 binding. While GABPA has been reported to show increased binding at the mutant promoter compared to the wild-type (Bell et al., 2015, Science), no telomere length (TL) sensitivity has been noted yet. In the current manuscript we show that telomere alterations in hTERT mutant cells (that do not form promoter G-quadruplex) does not significantly affect TRF2 occupancy at the promoter, reinforcing our earlier findings that G-quadruplex formation is crucial for TRF2 recruitment. Since TRF2 binding is not affected this would not impact GABPA binding. Therefore change in TL is unlikely to influence ETS binding by GABPA.

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      We agree with the reviewer’s suggestion that stabilizing G-quadruplex (G4) structures in mutant promoter cells under ST and LT conditions would further strengthen our hypothesis. From our ChIP experiments on hTERT promoter mutant cells following G4 stabilization with ligands, as reported in Sharma et al. 2021 (Figure 5G), we observed that TRF2 occupancy was regained in the telomere-length unaltered versions of -124G>A and -146G>A HEK293T Gaussia luciferase cells (referred to as LT cells in the current manuscript).

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      In this study, we employed both telomerase-dependent and independent methods for telomere elongation.

      HT1080 model: Telomere elongation resulted from constitutive overexpression of hTERC and hTERT, leading to a direct correlation with telomerase activity.

      HCT116 (p53-null) model: hTERC silencing in ST cells, a known limiting factor for telomerase activity, resulted in significantly lower telomerase activity and a 1.5-fold telomere length difference.

      MDAMB231 model: Guanine-rich telomeric repeat (GTR) feeding induced telomere elongation through recombinatorial mechanisms (Wright et al., 1996), leading to significant telomere length gain but no notable change in telomerase activity.

      HCT116 Cas9-telomeric sgRNA model: Telomere shortening occurred without modifying telomerase components, resulting in a minor, insignificant increase in telomerase activity (Figure 2A, S1).

      Regarding xenograft-derived HT1080 ST and LT cells (Figure 4B, S3), the observed variability in telomere length and telomerase activity may stem from infiltrating mouse cells, which naturally have longer telomeres and higher telomerase activity than human cells. Since in the reported assay tumour masses were not sorted to exclude mouse cells, using species-specific markers or fluorescently labelled HT1080 cells in future experiments would minimize bias. However, even though telomere length and telomerase activity assays cannot differentiate for cross-species differences, mRNA analysis and ChIP experiments performed specifically for hTERT and hTERC mRNA levels, TRF2 occupancy, and H3K27me3 enrichment on hTERT promoter (Figure 4B–E) strongly support our conclusions.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

      The current study provides experimental evidence that TRF2, a well-characterized telomere-binding protein, mediates crosstalk between telomeres and the regulatory region of the hTERT gene in a telomere length-dependent manner. Given the observed link between hTERT expression and telomere length, it is likely that additional telomere-associated proteins and regulatory pathways contribute to this regulation.

      The remaining shelterin complex components—POT1, hRap1, TRF1, TIN2, and TPP1—may play crucial roles in this context, as they are integral to telomere maintenance and protection (Stewart J et al., 2012 Mutat Res.). Additionally, several DNA damage response (DDR) proteins, which interact with telomere-binding factors and help preserve telomere integrity, could potentially influence hTERT regulation in a telomere length-dependent manner (Longhese M, 2008 Genes & Development). However, direct interactions or regulatory roles would require further experimental validation. Another group of proteins with potential relevance in this mechanism are the sirtuins, which directly associate with telomeres and are known to positively regulate telomere length, undergoing repression upon telomere shortening (Amano H et al., 2019 Cell Metabolism, Amano H, Sahin E 2019 Molecular & Cellular Oncology). Notably, SIRT1 has been reported to interact with telomerase (Lee SE et al., 2024, Biochem Biophys Res Commun.), while SIRT6 has been implicated in TRF2 degradation (Rizzo et al. 2017) and telomerase activation (Chen J et al. 2021, Aging) . Given their roles in telomere homeostasis, sirtuins may serve as key mediators of telomere length-dependent hTERT regulation.

      Based on this suggestion, we have included the above in Discussion.

      Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      We appreciate the reviewer’s recognition of the resource-intensive nature of our experiments, and we are confident in the robustness of the observed results. Due to the project’s timeline constraints and the need for consistency across experiments, we have reported findings based on 3 biological replicates with appropriate statistical analysis.

      Regarding the fibroblast-iPSC model, we would like to clarify that we have presented data from two independent biological replicates, each consisting of a fibroblast and its derived iPS cell pair, rather than a single sample. Additionally, the Tel-FACS assays involved analysing at least 10,000 events, ensuring statistical significance in all cases.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      The above point has been raised by the reviewer in the 'Recommendations for Authors' section as well. We have addressed it in detail in that section, citing each figure where the reviewer noted a concern regarding the lack of variance. Changes made in the manuscript have also been highlighted there.

      We would like to clarify that, throughout the manuscript, fold changes were previously calculated independently for each biological replicate by normalizing treated conditions to their corresponding control (untreated or Day 0) sample within the same replicate. This means that the control group is normalized to 1 individually in each replicate, resulting in an apparent lack of variance in the control when plotted. The normalization was not performed using an averaged control value across replicates. As such, the absence of visible variance in the control group reflects the normalization method rather than a true lack of variability in the underlying data.

      In the revised version of the manuscript, we have carefully considered the reviewer’s comments and applied changes wherever appropriate. For example (detailed response in the ‘Recommendations for Authors’ section), in datasets where two distinct stable cell lines are compared (e.g., HT1080 ST/LT and HCT p53-null ST/LT), unpaired statistical analysis is more appropriate. Hence, we have updated these panels accordingly and indicated the statistical methods used in the figure legends and Methods section. However, in experiments where cells were indeed seeded separately and subsequently subjected to experimental conditions—representing paired samples—we have chosen not to make any changes. A clearer description of this procedure has, however, been added to the Methods and figure legends to ensure full transparency.

      We believe this approach accurately reflects the experimental design, appropriately addresses the reviewer’s concerns regarding variance and statistical analysis, and ensures clarity and rigor in data reporting.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      We appreciate the reviewer's thoughtful feedback on the presentation of the luciferase assay data in Figure 5. The data for the wild-type hTERT promoter (capable of forming G4 structures) was previously reported in Figure 2G-K. To avoid redundancy in data presentation, we initially chose to report the results of the mutated promoter separately. However, we recognize that directly comparing the wild-type and mutated promoter constructs within the same figure would provide clearer context and strengthen the interpretation of the results. In light of this, we have updated Figure 5 in the revised manuscript to include the data for both constructs, ensuring a more comprehensive and informative comparison.

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

      We thank the reviewer again for their thoughtful suggestions regarding figure formatting and colour coding to improve clarity. We fully understand the rationale for proposing separate colours for unmodified, telomere-shortened, and telomere-lengthened groups, as this could make the experimental design more immediately apparent. However, after careful consideration, we believe that implementing this change across all figures may unintentionally reduce clarity in other aspects  (presented in other figures) of the data presentation. This is further explained below.

      Specifically, applying three distinct colours throughout would make it harder to visually track key biological trends—such as changes in chromatin occupancy—across different models. For instance, the same colour could represent opposing regulatory patterns in distinct contexts (e.g., upregulation in one model and downregulation in another), which will make these figures difficult to understand. We feel that maintaining a consistent colour scheme based on telomere status—i.e., long telomeres (LT) vs short telomeres (ST)—across figures facilitates better comparison of biological outcomes across different experimental systems.

      Nevertheless, to address the reviewer’s concern about clarity in experimental design, we have added more detailed descriptions of the methodology and model systems used, in both the Methods and figure legend sections. These updates aim to make it easier for the reader to follow which groups serve as isogenic controls versus modified samples, without disrupting the consistency of data visualization.

      We hope this strikes a balance between improving clarity and preserving the interpretability of the broader biological trends presented in our manuscript.

      Please note, we have incorporated the reviewer’s suggestion to indicate details of model generation for HT1080 and MDAMB 231 cell lines in Figure 2. To quote the reviewer,  

      “I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right.”

      We have also put hTERT promoter GAPDH (-ve control) under each graph and not at the end of Panel C in Figure 2, as suggested by reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) Please check for grammatical errors throughout the manuscript.

      We have gone through the manuscript thoroughly, checked and corrected it for grammatical errors if and where detected.

      (2) Please use both the FACS and qPCR-based assays to check telomere length in all the experiments to strengthen the observations.

      We would like to thank the reviewer for this valuable suggestion. We confirm that both FACS- and qPCR-based assays were performed to assess telomere length in our experiments. In the original submission, we chose to present primarily the FACS-based data in the main figures. This decision was based on the inherent differences in the measurement principles of the two methods, which can lead to discrepancies in the reported fold changes. We were concerned that presenting both datasets side by side in the main figures might lead to confusion for readers who are not directly familiar with the nuances of telomere length assays.

      However, in light of the reviewer’s suggestion, we have now included the qPCR-based data as Supplementary Figure 1A, and updated the manuscript text and figure legends accordingly to reflect this addition.

      (3) Correct the labeling in the legend (Figure 2).

      We have corrected legend of Figure 2. Thanks to the reviewer for pointing it out.

      (4) In Figure 6B, why TRF WT condition have higher hTERT expression than the UT condition?

      We thank the reviewer for noting that the hTERT mRNA levels, as estimated by FISH in Figure 6B, appear slightly higher in TRF2 WT overexpressing HT1080 cells compared to the untransfected (UT) condition. Specifically, the average mean intensity values (a.u.) were 53 for UT and 57 for WT. Although this difference was not statistically significant, we acknowledge the reviewer's observation. Currently, we do not have a clear explanation for this small, non-significant variation.

      Importantly, using the same FISH-based method, we observed a significant upregulation of hTERT mRNA levels upon TRF2 R17H overexpression compared to both UT and TRF2 WT conditions, supporting our key conclusions.

      Additionally, qRT-PCR analysis of hTERT mRNA levels in cells stably expressing TRF2 WT (induced by doxycycline) consistently showed a significant downregulation compared to the uninduced (equivalent to UT in the microscopy experiments) state. These results were robust and reproducible across three different cell lines, including HT1080. Consistently, TRF2 R17H expression led to significant upregulation of hTERT mRNA levels upon induction.

      Together, these complementary findings strengthen the validity of our observations.

      (5) In telomere length between ST and LT in Fig. 5B significant? (especially the right panel -146G>A).

      We consistently worked with approximately 20–30% telomere shortening in HEK293 cells across all three cell types (WT promoter, -124G>A, and -146G>A), as this range was reproducibly achieved within the experimental timeframe without risking excessive telomere trimming. The reported telomere length differences are based on FACS analysis of more than 10,000 events per condition, providing strong statistical significance. Importantly, while the absolute differences in telomere length may appear modest, their biological impact is evident in the distinct cellular characteristics observed between ST and LT cell pairs.

      Reviewer #2 (Recommendations for the authors):

      As mentioned above it was somewhat unclear why so many instances of control groups had no variance between them. A more complete reporting of the formulas used to calculate the results, and methods (if samples were divided from a single source into different conditions) would be appreciated.

      We thank the reviewer for their valuable and detailed feedback. The instances where the control groups appeared to lack variance were mainly mRNA data (Figure 2D, 3G,3N), luciferase activity (Figure 2K), and in vitro methyltransferase activity (Figure 6G). We shall try to categorically address them all. 

      In Figure 2D, for the MDA-MB-231  GTR oligo and HCT116 telomere trimming datasets, the untreated cells were seeded separately and subsequently used to generate the treated conditions within the same experiment. Thus, these two datasets represent paired experimental conditions. Fold changes were calculated independently for each replicate (paired samples), and the fold changes across replicates were plotted. Because the control group serves as a common baseline within each pair and fold changes are normalized individually, minimal variance appears across controls. Given the experimental design, we believe no change is necessary for these panels. However, we have provided additional clarification regarding the calculation formulas and sample handling in the Methods section to avoid any ambiguity.

      For the ST/LT versions in HT1080 and HCT p53-null background cells, while each replicate could technically be treated as paired, these could be treated as four distinct stable cell lines. Hence, we agree it would be appropriate to apply unpaired statistical analysis for these datasets. We have updated the plots accordingly and described the statistical methods in detail in the figure legends and Methods section.

      Figure 3G and 3N depict the doxycycline-induced cells which follow the design: untreated and dox-treated conditions were seeded from the same batch of cells into separate flasks and treated differently. Hence, these are also paired cases, and fold changes were calculated per replicate before plotting. Therefore, we believe no changes are necessary for these panels. However, we have provided more details regarding sample handling in the Methods section to avoid any ambiguity.

      In Figure 2K, previously we had plotted fold change in luciferase activity over short telomere (ST) cells, for each independent biological replicates. However, to address the reviewer’s concern of not showing variance in control group, we have now plotted the luminescence signal (normalised over total protein). We have also updated Figure 5E accordingly, and also included WT promoter data along with the mutant cell line data- as was suggested in public reviewer’s comment.

      In Figure 6G, as each replicate of the in vitro methyltransferase activity used different batches of purified protein, there are inherent batch differences that were accounted for by normalizing each replicate internally. Fold changes were then determined for each replicate separately, as previously described. The fold changes across replicates were plotted, and significance between different conditions was tested using two-way ANOVA. To address the reviewer’s comment to show variance in the control, we have now plotted individual replicates.

      We believe these revisions, along with the expanded methods clarification, will fully address the reviewer's concerns and accurately reflect the experimental design and statistical analysis applied.

      Many times, in the manuscript a / is used to indicate both directions. For example: "Genes distal from telomeres (for instance 60 Mb from the nearest telomere) were activated/repressed in a TL-dependent way"... "Resulting increase/decrease in non-telomeric promoter-bound TRF2 affected gene expression". For readability, either this can be replaced with a directionless word like altered, changed, etc, or the writer can list both directions.

      We thank the reviewer for the careful reading and thoughtful suggestions. In the manuscript, we have used the ‘/’ symbol to indicate opposing directions, followed by the word ‘respectively’ to relate these directions to their corresponding outcomes, wherever appropriate. However, as rightly pointed out, certain sentences would benefit from alternative constructions for improved clarity and readability. We have therefore reviewed the manuscript and revised such sentences, making minor modifications wherever necessary, as outlined below.

      We found hTERT was transcriptionally altered depending on telomere length (TL).

      Notably, another conceptually distinct mechanism of TL-dependent gene regulation was reported which influenced genes spread throughout the genome: expression of genes distal from telomeres (for instance 60 Mb from the nearest telomere) was altered in a TL-dependent way, but without physical telomere looping interactions.

      Second, the shortening or elongation of telomeres led to the release or sequestration of telomeric TRF2, respectively, thereby increasing or decreasing the availability of TRF2 at non-telomeric promoters and affecting gene expression.

      A non-necessary, but potentially extra convincing experiment to perform would be to use a combination of light-activated, or ligand-activated cas9 telomere trimming and guanine terminal repeat additions in the same cell line. Like the dox experiments, this would show over time how altering telomere length alters the recruitment of heterochromatin factors and hTERT levels. Executing the experiment this way would be more definitive as it does not rely on changing hTERT itself. Authors do already have examples that support their claims.

      We thank the reviewer for suggesting this additional experiment (reviewer mentions as non-necessary), which would indeed provide valuable insights into the relationship between telomere length, heterochromatin factor recruitment, and hTERT levels. While we recognize the potential of this approach, due to constraints on resources, we are currently unable to execute this experiment. However, we believe that the existing data presented in the manuscript already supports our conclusions effectively.

    1. eLife Assessment

      This study provides valuable insights into the anti-senescence effects of enalapril, identifying pSmad1/5/9 signaling and associated antioxidant pathways as key mediators of its physiological benefits in aged mice. The authors present solid experimental evidence across both in vitro and in vivo systems, demonstrating improved organ function and reduced senescence markers following treatment. Overall, the work supports the repurposing potential of enalapril in aging research and expands understanding of its molecular targets.